Multimodal Large Language Models: A Survey
Jinan University · South China University of Technology · +1 more institution
Abstract
The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to understand and process other data types. Multimodal models address this limitation by combining various modalities, enabling a more comprehensive understanding of diverse data. This paper begins by defining the concept of multimodal and examining the historical development of multimodal algorithms. Furthermore, we introduce a range of multimodal products, focusing on the efforts of major technology companies. A practical guide is provided, offering insights into the technical aspects…
Citation impact
- FWCI
- 33.64
- Percentile
- 100%
- References
- 103
Authors
5Topics & keywords
- Computer science
- Natural language processing
- Artificial intelligence
- Quality Education