DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Tsinghua University

Indexed incrossref

Abstract

Recent progress has shown that large-scale pre-training using contrastive image-text pairs can be a promising alternative for high-quality visual representation learning from natural language supervision. Benefiting from a broader source of supervision, this new paradigm exhibits impressive transferability to downstream classification tasks and datasets. However, the problem of transferring the knowledge learned from image-text pairs to more complex dense prediction tasks has barely been visited. In this work, we present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP. Specifically, we convert the original image-text matching problem in CLIP to a…

Citation impact

536
total citations
FWCI
29.68
Percentile
100%
References
72
Citations per year

Authors

8

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Exploit
  • Context (archaeology)
  • Segmentation
  • Natural language processing
  • Matching (statistics)
  • Machine learning
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding