VLP: A Survey on Vision-language Pre-training
Chinese Academy of Sciences · Shandong Institute of Automation · +1 more institution
Abstract
Abstract In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown that they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances in five aspects: feature extraction, model…
Citation impact
- FWCI
- 25.50
- Percentile
- 100%
- References
- 156
Authors
7- FCFeilong ChenCorresponding
Chinese Academy of Sciences, Shandong Institute of Automation, University of Chinese Academy of Sciences
- DZDuzhen Zhang
Chinese Academy of Sciences, Shandong Institute of Automation, University of Chinese Academy of Sciences
- MHMinglun Han
Chinese Academy of Sciences, Shandong Institute of Automation, University of Chinese Academy of Sciences
- XCXiu-Yi Chen
Chinese Academy of Sciences, Shandong Institute of Automation, University of Chinese Academy of Sciences
- JSJing Shi
Chinese Academy of Sciences, Shandong Institute of Automation
Topics & keywords
- Computer science
- GRASP
- Modal
- Artificial intelligence
- Field (mathematics)
- Scratch
- Architecture
- Feature (linguistics)
- Quality Education