MaPLe: Multi-modal Prompt Learning

Khattak, Muhammad Uzair; Rasheed, Hanoona; Maaz, Muhammad; Khan, Salman; Khan, Fahad Shahbaz

doi:10.1109/cvpr52729.2023.01832

articleJun 1, 2023Closed access

MaPLe: Multi-modal Prompt Learning

MUMuhammad Uzair Khattak HRHanoona Rasheed MMMuhammad Maaz SKSalman Khan FSFahad Shahbaz Khan

Mohamed bin Zayed University of Artificial Intelligence · Australian National University

Indexed incrossref

Abstract

Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to perform well. Inspired by the Natural Language Processing (NLP) literature, recent CLIP adaptation approaches learn prompts as the textual inputs to fine-tune CLIP for downstream tasks. We note that using prompting to adapt representations in a single branch of CLIP (language or vision) is sub-optimal since it does not allow the flexibility to dynamically adjust both representation spaces on a downstream task. In this work, we propose Multi-modal Prompt Learning (MaPLe) for both…

Citation impact

718

total citations

FWCI: 82.06
Percentile: 100%
References: 77

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Generalization
Artificial intelligence
Context (archaeology)
Modal
Flexibility (engineering)
Representation (politics)
Machine learning

UN Sustainable Development Goals

Quality Education

No related works found for this paper.