MaPLe: Multi-modal Prompt Learning
Mohamed bin Zayed University of Artificial Intelligence · Australian National University
Abstract
Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to perform well. Inspired by the Natural Language Processing (NLP) literature, recent CLIP adaptation approaches learn prompts as the textual inputs to fine-tune CLIP for downstream tasks. We note that using prompting to adapt representations in a single branch of CLIP (language or vision) is sub-optimal since it does not allow the flexibility to dynamically adjust both representation spaces on a downstream task. In this work, we propose Multi-modal Prompt Learning (MaPLe) for both…
Citation impact
- FWCI
- 82.06
- Percentile
- 100%
- References
- 77
Authors
5- MUMuhammad Uzair KhattakCorresponding
Mohamed bin Zayed University of Artificial Intelligence
- HRHanoona Rasheed
Mohamed bin Zayed University of Artificial Intelligence
- MMMuhammad Maaz
Mohamed bin Zayed University of Artificial Intelligence
- SKSalman Khan
Mohamed bin Zayed University of Artificial Intelligence, Australian National University
- FSFahad Shahbaz Khan
Mohamed bin Zayed University of Artificial Intelligence
Topics & keywords
- Computer science
- Generalization
- Artificial intelligence
- Context (archaeology)
- Modal
- Flexibility (engineering)
- Representation (politics)
- Machine learning
- Quality Education