NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario
Indexed incrossref
Abstract
We introduce a novel visual question answering (VQA) task in the context of autonomous driving, aiming to answer natural language questions based on street-view clues. Compared to traditional VQA tasks, VQA in autonomous driving scenario presents more challenges. Firstly, the raw visual data are multi-modal, including images and point clouds captured by camera and LiDAR, respectively. Secondly, the data are multi-frame due to the continuous, real-time acquisition. Thirdly, the outdoor scenes exhibit both moving foreground and static background. Existing VQA benchmarks fail to adequately address these complexities. To bridge this gap, we propose NuScenes-QA, the first benchmark for VQA in the autonomous driving…
Citation impact
118
total citations
- FWCI
- 12.65
- Percentile
- 100%
- References
- 56
Citations per year
Authors
5Topics & keywords
Topics
Keywords
- Benchmark (surveying)
- Modal
- Question answering
- Computer science
- Artificial intelligence
- Natural language processing
- Cartography
- Geography
No related works found for this paper.