NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario

Fudan University

Indexed incrossref

Abstract

We introduce a novel visual question answering (VQA) task in the context of autonomous driving, aiming to answer natural language questions based on street-view clues. Compared to traditional VQA tasks, VQA in autonomous driving scenario presents more challenges. Firstly, the raw visual data are multi-modal, including images and point clouds captured by camera and LiDAR, respectively. Secondly, the data are multi-frame due to the continuous, real-time acquisition. Thirdly, the outdoor scenes exhibit both moving foreground and static background. Existing VQA benchmarks fail to adequately address these complexities. To bridge this gap, we propose NuScenes-QA, the first benchmark for VQA in the autonomous driving…

Citation impact

118
total citations
FWCI
12.65
Percentile
100%
References
56
Citations per year

Authors

5

Topics & keywords

Keywords
  • Benchmark (surveying)
  • Modal
  • Question answering
  • Computer science
  • Artificial intelligence
  • Natural language processing
  • Cartography
  • Geography
No related works found for this paper.

Funding