Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Chen, Long; Sinavski, Oleg; Hünermann, Jan; Karnsund, Alice; Willmott, Andrew James; Birch, Danny; Maund, Daniel; Shotton, Jamie

doi:10.1109/icra57147.2024.10611018

articleMay 13, 2024Closed access

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

LCLong Chen OSOleg Sinavski JHJan Hünermann AKAlice Karnsund AJAndrew James Willmott

Indexed incrossref

Abstract

Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique objectlevel multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric…

Citation impact

147

total citations

FWCI: 47.41
Percentile: 100%
References: 61

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Modality (human–computer interaction)
Computer vision
Object (grammar)
Artificial intelligence
Computer science

No related works found for this paper.