NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario

Qian, Tianwen; Chen, Jingjing; Zhuo, Linhai; Jiao, Yang; Jiang, Yu–Gang

doi:10.1609/aaai.v38i5.28253

articleProceedings of the AAAI Conference on Artificial IntelligenceMar 24, 2024DIAMOND OA

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario

TQTianwen Qian JCJingjing Chen LZLinhai Zhuo YJYang Jiao YJYu–Gang Jiang

Fudan University

Indexed incrossref

Abstract

We introduce a novel visual question answering (VQA) task in the context of autonomous driving, aiming to answer natural language questions based on street-view clues. Compared to traditional VQA tasks, VQA in autonomous driving scenario presents more challenges. Firstly, the raw visual data are multi-modal, including images and point clouds captured by camera and LiDAR, respectively. Secondly, the data are multi-frame due to the continuous, real-time acquisition. Thirdly, the outdoor scenes exhibit both moving foreground and static background. Existing VQA benchmarks fail to adequately address these complexities. To bridge this gap, we propose NuScenes-QA, the first benchmark for VQA in the autonomous driving…

Citation impact

118

total citations

FWCI: 12.65
Percentile: 100%
References: 56

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Benchmark (surveying)
Modal
Question answering
Computer science
Artificial intelligence
Natural language processing
Cartography
Geography

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China
Award: 62072116