articleDec 15, 2023Closed access

Multimodal Large Language Models: A Survey

Jinan University · South China University of Technology · +1 more institution

Indexed incrossref

Abstract

The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to understand and process other data types. Multimodal models address this limitation by combining various modalities, enabling a more comprehensive understanding of diverse data. This paper begins by defining the concept of multimodal and examining the historical development of multimodal algorithms. Furthermore, we introduce a range of multimodal products, focusing on the efforts of major technology companies. A practical guide is provided, offering insights into the technical aspects…

Citation impact

202
total citations
FWCI
33.64
Percentile
100%
References
103
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Natural language processing
  • Artificial intelligence
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding