Conditional Prompt Learning for Vision-Language Models
Nanyang Technological University
Abstract
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt learning—a recent trend in NLP—to the vision domain for adapting pre-trained vision-language models. Specifically, CoOp turns context words in a prompt into a set of learnable vectors and, with only a few labeled images for learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify a critical problem of CoOp: the learned context is not generalizable to wider unseen classes within the same dataset, suggesting that CoOp…
Citation impact
- FWCI
- 77.66
- Percentile
- 100%
- References
- 88
Authors
4Topics & keywords
- Computer science
- Artificial intelligence
- Generalization
- Machine learning
- Context (archaeology)
- Set (abstract data type)
- Class (philosophy)
- Code (set theory)