Visual7W: Grounded Question Answering in Images
Stanford University · Laboratoire d'Informatique de Paris-Nord · +1 more institution
Abstract
We have seen great progress in basic perceptual tasks such as object recognition and detection. However, AI models still fail to match humans in high-level vision tasks due to the lack of capacities for deeper reasoning. Recently the new task of visual question answering (QA) has been proposed to evaluate a model's capacity for deep image understanding. Previous works have established a loose, global association between QA sentences and images. However, many questions and answers, in practice, relate to local regions in the images. We establish a semantic link between textual descriptions and image regions by object-level grounding. It enables a new type of QA with visual answers, in addition to textual…
Citation impact
- FWCI
- 55.08
- Percentile
- 100%
- References
- 86
Authors
4Topics & keywords
- Computer science
- Question answering
- Task (project management)
- Artificial intelligence
- Object (grammar)
- Natural language processing
- Visualization
- Perception
- Quality Education