Conditional Prompt Learning for Vision-Language Models

Nanyang Technological University

Indexed incrossref

Abstract

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt learning—a recent trend in NLP—to the vision domain for adapting pre-trained vision-language models. Specifically, CoOp turns context words in a prompt into a set of learnable vectors and, with only a few labeled images for learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify a critical problem of CoOp: the learned context is not generalizable to wider unseen classes within the same dataset, suggesting that CoOp…

Citation impact

1,466
total citations
FWCI
77.66
Percentile
100%
References
88
Citations per year

Authors

4

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Generalization
  • Machine learning
  • Context (archaeology)
  • Set (abstract data type)
  • Class (philosophy)
  • Code (set theory)
No related works found for this paper.