articleDec 1, 2015Closed access

VQA: Visual Question Answering

Virginia Tech · Microsoft (United States) · +4 more institutions

Indexed incrossref

Abstract

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words…

Citation impact

4,265
total citations
FWCI
102.37
Percentile
100%
References
95
Citations per year

Authors

7

Topics & keywords

Keywords
  • Question answering
  • Mirroring
  • Computer science
  • Task (project management)
  • Context (archaeology)
  • Natural language
  • Set (abstract data type)
  • Image (mathematics)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.