Learning to Compose Dynamic Tree Structures for Visual Contexts
Nanyang Technological University · Tencent (China)
Abstract
We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping visual reasoning tasks such as scene graph generation and visual Q&A. Our visual context tree model, dubbed VCTree, has two key advantages over existing structured object representations including chains and fully-connected graphs: 1) The efficient and expressive binary tree encodes the inherent parallel/hierarchical relationships among objects, e.g., ``clothes'' and ``pants'' are usually co-occur and belong to ``person''; 2) the dynamic structure varies from image to image and task to task, allowing more content-/task-specific message passing among objects. To construct a VCTree, we design a score…
Citation impact
- FWCI
- 23.27
- Percentile
- 100%
- References
- 91
Authors
5Topics & keywords
- Computer science
- Artificial intelligence
- Tree (set theory)
- Tree structure
- Scene graph
- Context (archaeology)
- Graph
- Visualization