DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Rao, Yongming; Zhao, Wenliang; Chen, Guangyi; Tang, Yansong; Zhu, Zheng; Huang, Guan; Zhou, Jie; Lu, Jiwen

doi:10.1109/cvpr52688.2022.01755

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

YRYongming Rao WZWenliang Zhao GCGuangyi Chen YTYansong Tang ZZZheng Zhu

Tsinghua University

Indexed incrossref

Abstract

Recent progress has shown that large-scale pre-training using contrastive image-text pairs can be a promising alternative for high-quality visual representation learning from natural language supervision. Benefiting from a broader source of supervision, this new paradigm exhibits impressive transferability to downstream classification tasks and datasets. However, the problem of transferring the knowledge learned from image-text pairs to more complex dense prediction tasks has barely been visited. In this work, we present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP. Specifically, we convert the original image-text matching problem in CLIP to a…

Citation impact

536

total citations

FWCI: 29.68
Percentile: 100%
References: 72

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Exploit
Context (archaeology)
Segmentation
Natural language processing
Matching (statistics)
Machine learning

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China
Award: 62125603,U1813218