articleMachine Intelligence ResearchJun 6, 2023HYBRID OA

Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey

Anhui University · Peng Cheng Laboratory · +2 more institutions

Indexed incrossref

Abstract

Abstract With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT), generative pre-trained transformers (GPT), etc. Inspired by the success of these models in single domains (like computer vision and natural language processing), the multi-modal pre-trained big models have also drawn more and more attention in recent years. In this work, we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works. Specifically, we firstly introduce the background of multi-modal pre-training by reviewing the…

Citation impact

191
total citations
FWCI
21.74
Percentile
100%
References
242
Citations per year

Authors

8

Topics & keywords

Keywords
  • Computer science
  • Modal
  • Transformer
  • Deep learning
  • Artificial intelligence
  • Machine learning
  • Generative grammar
  • Visualization
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding