articleMay 13, 2024Closed access
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Indexed incrossref
Abstract
Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique objectlevel multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric…
Citation impact
147
total citations
- FWCI
- 47.41
- Percentile
- 100%
- References
- 61
Citations per year
Authors
8Topics & keywords
Topics
Keywords
- Modality (human–computer interaction)
- Computer vision
- Object (grammar)
- Artificial intelligence
- Computer science
No related works found for this paper.