Scene Graph Generation from Objects, Phrases and Region Captions

Li, Yikang; Ouyang, Wanli; Zhou, Bolei; Wang, Kun; Wang, Xiaogang

doi:10.1109/iccv.2017.142

articleOct 1, 2017Closed access

Scene Graph Generation from Objects, Phrases and Region Captions

YLYikang Li WOWanli Ouyang BZBolei Zhou KWKun Wang XWXiaogang Wang

Chinese University of Hong Kong · University of Sydney · +1 more institution

Indexed incrossref

Abstract

Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations and other context information. In this work, to leverage the mutual connections across semantic levels, we propose a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner. Object, phrase, and caption regions are first aligned with a dynamic graph based…

Citation impact

542

total citations

FWCI: 22.09
Percentile: 100%
References: 75

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Closed captioning
Scene graph
Artificial intelligence
Pairwise comparison
Leverage (statistics)
Natural language processing
Graph

No related works found for this paper.