Visual7W: Grounded Question Answering in Images

Zhu, Yuke; Groth, Oliver; Bernstein, Michael S.; Fei-Fei, Li

doi:10.1109/cvpr.2016.540

preprintJun 1, 2016Closed access

Visual7W: Grounded Question Answering in Images

YZYuke Zhu OGOliver Groth MSMichael S. Bernstein LFLi Fei-Fei

Stanford University · Laboratoire d'Informatique de Paris-Nord · +1 more institution

Indexed incrossref

Abstract

We have seen great progress in basic perceptual tasks such as object recognition and detection. However, AI models still fail to match humans in high-level vision tasks due to the lack of capacities for deeper reasoning. Recently the new task of visual question answering (QA) has been proposed to evaluate a model's capacity for deep image understanding. Previous works have established a loose, global association between QA sentences and images. However, many questions and answers, in practice, relate to local regions in the images. We establish a semantic link between textual descriptions and image regions by object-level grounding. It enables a new type of QA with visual answers, in addition to textual…

Citation impact

796

total citations

FWCI: 55.08
Percentile: 100%
References: 86

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Question answering
Task (project management)
Artificial intelligence
Object (grammar)
Natural language processing
Visualization
Perception

UN Sustainable Development Goals

Quality Education

No related works found for this paper.