Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey
Anhui University · Peng Cheng Laboratory · +2 more institutions
Abstract
Abstract With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT), generative pre-trained transformers (GPT), etc. Inspired by the success of these models in single domains (like computer vision and natural language processing), the multi-modal pre-trained big models have also drawn more and more attention in recent years. In this work, we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works. Specifically, we firstly introduce the background of multi-modal pre-training by reviewing the…
Citation impact
- FWCI
- 21.74
- Percentile
- 100%
- References
- 242
Authors
8Topics & keywords
- Computer science
- Modal
- Transformer
- Deep learning
- Artificial intelligence
- Machine learning
- Generative grammar
- Visualization
- Quality Education