Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Virginia Tech · DEVCOM Army Research Laboratory · +1 more institution
Abstract
Problems at the intersection of vision and language are of significant importance both as challenging research questions and for the rich set of applications they enable. However, inherent structure in our world and bias in our language tend to be a simpler signal for learning than visual modalities, resulting in models that ignore visual information, leading to an inflated sense of their capability. We propose to counter these language priors for the task of Visual Question Answering (VQA) and make vision (the V in VQA) matter! Specifically, we balance the popular VQA dataset (Antol et al., ICCV 2015) by collecting complementary images such that every question in our balanced dataset is associated with not…
Citation impact
- FWCI
- 59.05
- Percentile
- 100%
- References
- 71
Authors
5Topics & keywords
- Question answering
- Computer science
- Prior probability
- Artificial intelligence
- Set (abstract data type)
- Benchmark (surveying)
- Machine learning
- Image (mathematics)
- Quality Education