Scene Graph Generation from Objects, Phrases and Region Captions
Chinese University of Hong Kong · University of Sydney · +1 more institution
Abstract
Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations and other context information. In this work, to leverage the mutual connections across semantic levels, we propose a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner. Object, phrase, and caption regions are first aligned with a dynamic graph based…
Citation impact
- FWCI
- 22.09
- Percentile
- 100%
- References
- 75
Authors
5Topics & keywords
- Computer science
- Closed captioning
- Scene graph
- Artificial intelligence
- Pairwise comparison
- Leverage (statistics)
- Natural language processing
- Graph