preprintJun 1, 2016Closed access

Visual7W: Grounded Question Answering in Images

Stanford University · Laboratoire d'Informatique de Paris-Nord · +1 more institution

Indexed incrossref

Abstract

We have seen great progress in basic perceptual tasks such as object recognition and detection. However, AI models still fail to match humans in high-level vision tasks due to the lack of capacities for deeper reasoning. Recently the new task of visual question answering (QA) has been proposed to evaluate a model's capacity for deep image understanding. Previous works have established a loose, global association between QA sentences and images. However, many questions and answers, in practice, relate to local regions in the images. We establish a semantic link between textual descriptions and image regions by object-level grounding. It enables a new type of QA with visual answers, in addition to textual…

Citation impact

796
total citations
FWCI
55.08
Percentile
100%
References
86
Citations per year

Authors

4

Topics & keywords

Keywords
  • Computer science
  • Question answering
  • Task (project management)
  • Artificial intelligence
  • Object (grammar)
  • Natural language processing
  • Visualization
  • Perception
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.