Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Krishna, Ranjay; Zhu, Yuke; Groth, Oliver; Johnson, Justin; Hata, Kenji; Kravitz, Joshua; Chen, Stephanie; Kalantidis, Yannis; Li, Li-Jia; Shamma, David A.; Bernstein, Michael S.; Fei-Fei, Li

doi:10.1007/s11263-016-0981-7

articleInternational Journal of Computer VisionFeb 6, 2017HYBRID OA

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

RKRanjay Krishna YZYuke Zhu OGOliver Groth JJJustin Johnson KHKenji Hata

Stanford University · Technische Universität Dresden · +3 more institutions

Indexed incrossref

Abstract

Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that involve not just recognizing, but reasoning about our visual world. However, models used to tackle the rich content in images for cognitive tasks are still being trained using the same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the interactions and relationships between objects in an image. When asked “What vehicle is the person riding?”, computers will need to identify the objects in an image as well as the relationships riding(man, carriage) and…

Citation impact

5,152

total citations

FWCI: 143.43
Percentile: 100%
References: 132

Citations per year

Authors

12

Topics & keywords

Topics

Keywords

Artificial intelligence
Computer science
Natural language processing
Genome
Image (mathematics)
Pattern recognition (psychology)
Computer vision
Annotation

No related works found for this paper.