articleMachine Intelligence ResearchJan 10, 2023HYBRID OA

VLP: A Survey on Vision-language Pre-training

Chinese Academy of Sciences · Shandong Institute of Automation · +1 more institution

Indexed inarxivcrossref

Abstract

Abstract In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown that they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances in five aspects: feature extraction, model…

No related works found for this paper.

Funding