Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

Du, Yu; Wei, Fangyun; Zhang, Zihe; Shi, Miaojing; Gao, Yue; Li, Guoqi

doi:10.1109/cvpr52688.2022.01369

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

YDYu Du FWFangyun Wei ZZZihe Zhang MSMiaojing Shi YGYue Gao

Tsinghua University · Microsoft Research Asia (China) · +1 more institution

Indexed incrossref

Abstract

Recently, vision-language pre-training shows great potential in open-vocabulary object detection, where detectors trained on base classes are devised for detecting new classes. The class text embedding is firstly generated by feeding prompts to the text encoder of a pre-trained vision-language model. It is then used as the region classifier to supervise the training of a detector. The key element that leads to the success of this model is the proper prompt, which requires careful words tuning and ingenious design. To avoid laborious prompt engineering, there are some prompt representation learning methods being proposed for the image classification task, which however can only be sub-optimal solutions when…

Citation impact

312

total citations

FWCI: 17.47
Percentile: 100%
References: 59

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Pascal (unit)
Artificial intelligence
Object detection
Vocabulary
Classifier (UML)
Natural language processing
Detector

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

NK
National Key Research and Development Program of China
Award: 2021ZD0200300,2018AAA0102600