Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey

Wang, Xiao; Chen, Guangyao; Qian, Guangwu; Gao, Pengcheng; Wei, Xiao-Yong; Wang, Yaowei; Tian, Yonghong; Gao, Wen

doi:10.1007/s11633-022-1410-8

articleMachine Intelligence ResearchJun 6, 2023HYBRID OA

Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey

XWXiao Wang GCGuangyao Chen GQGuangwu Qian PGPengcheng Gao XWXiao-Yong Wei

Anhui University · Peng Cheng Laboratory · +2 more institutions

Indexed incrossref

Abstract

Abstract With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT), generative pre-trained transformers (GPT), etc. Inspired by the success of these models in single domains (like computer vision and natural language processing), the multi-modal pre-trained big models have also drawn more and more attention in recent years. In this work, we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works. Specifically, we firstly introduce the background of multi-modal pre-training by reviewing the…

Citation impact

191

total citations

FWCI: 21.74
Percentile: 100%
References: 242

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Modal
Transformer
Deep learning
Artificial intelligence
Machine learning
Generative grammar
Visualization

UN Sustainable Development Goals

Quality Education

No related works found for this paper.