articleJul 1, 2017Closed access

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Virginia Tech · DEVCOM Army Research Laboratory · +1 more institution

Indexed incrossref

Abstract

Problems at the intersection of vision and language are of significant importance both as challenging research questions and for the rich set of applications they enable. However, inherent structure in our world and bias in our language tend to be a simpler signal for learning than visual modalities, resulting in models that ignore visual information, leading to an inflated sense of their capability. We propose to counter these language priors for the task of Visual Question Answering (VQA) and make vision (the V in VQA) matter! Specifically, we balance the popular VQA dataset (Antol et al., ICCV 2015) by collecting complementary images such that every question in our balanced dataset is associated with not…

Citation impact

2,182
total citations
FWCI
59.05
Percentile
100%
References
71
Citations per year

Authors

5

Topics & keywords

Keywords
  • Question answering
  • Computer science
  • Prior probability
  • Artificial intelligence
  • Set (abstract data type)
  • Benchmark (surveying)
  • Machine learning
  • Image (mathematics)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.