Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Goyal, Yash; Khot, Tejas; Summers-Stay, Douglas; Batra, Dhruv; Parikh, Devi

doi:10.1109/cvpr.2017.670

articleJul 1, 2017Closed access

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

YGYash Goyal TKTejas Khot DSDouglas Summers-Stay DBDhruv Batra DPDevi Parikh

Virginia Tech · DEVCOM Army Research Laboratory · +1 more institution

Indexed incrossref

Abstract

Problems at the intersection of vision and language are of significant importance both as challenging research questions and for the rich set of applications they enable. However, inherent structure in our world and bias in our language tend to be a simpler signal for learning than visual modalities, resulting in models that ignore visual information, leading to an inflated sense of their capability. We propose to counter these language priors for the task of Visual Question Answering (VQA) and make vision (the V in VQA) matter! Specifically, we balance the popular VQA dataset (Antol et al., ICCV 2015) by collecting complementary images such that every question in our balanced dataset is associated with not…

Citation impact

2,182

total citations

FWCI: 59.05
Percentile: 100%
References: 71

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Question answering
Computer science
Prior probability
Artificial intelligence
Set (abstract data type)
Benchmark (surveying)
Machine learning
Image (mathematics)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.