A Survey on Multimodal Large Language Models for Autonomous Driving

Cui, Can; Ma, Yunsheng; Cao, Xu; Ye, Wenqian; Zhou, Yang; Liang, Kaizhao; Chen, Jintai; Lu, Juanwu; Yang, Zichong; Liao, Kuei-Da; Gao, Tianren; Li, Erlong; Kun, Tang; Cao, Zhipeng; Zhou, Tong; Liu, Ao; Yan, Xinrui; Mei, Shuqi; Cao, Jianguo; Wang, Ziran; Zheng, Chao

doi:10.1109/wacvw60836.2024.00106

articleJan 1, 2024Closed access

A Survey on Multimodal Large Language Models for Autonomous Driving

CCCan Cui YMYunsheng Ma XCXu Cao WYWenqian Ye YZYang Zhou

Purdue University West Lafayette · University of Illinois Urbana-Champaign · +3 more institutions

Indexed incrossref

Abstract

With the emergence of Large Language Models (LLMs) and Vision Foundation Models (VFMs), multimodal AI systems benefiting from large models have the potential to equally perceive the real world, make decisions, and control tools as humans. In recent months, LLMs have shown widespread attention in autonomous driving and map systems. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors to apply in LLM driving systems. In this paper, we present a systematic investigation in this field. We first introduce the background of Multimodal Large Language Models (MLLMs), the multimodal models development using LLMs, and the history of…

Citation impact

270

total citations

FWCI: 60.58
Percentile: 100%
References: 223

Citations per year

Authors

21

Topics & keywords

Topics

Keywords

Computer science
Human–computer interaction

No related works found for this paper.