RelTR: Relation Transformer for Scene Graph Generation
Leibniz University Hannover · University of Twente
Abstract
Different objects in the same scene are more or less related to each other, but only a limited number of these relationships are noteworthy. Inspired by Detection Transformer, which excels in object detection, we view scene graph generation as a set prediction problem. In this article, we propose an end-to-end scene graph generation model Relation Transformer (RelTR), which has an encoder-decoder architecture. The encoder reasons about the visual feature context while the decoder infers a fixed-size set of triplets subject-predicate-object using different types of attention mechanisms with coupled subject and object queries. We design a set prediction loss performing the matching between the ground truth and…
Citation impact
- FWCI
- 23.21
- Percentile
- 100%
- References
- 105
Authors
3Topics & keywords
- Computer science
- Scene graph
- Artificial intelligence
- Transformer
- Inference
- Encoder
- Ground truth
- Pattern recognition (psychology)