RelTR: Relation Transformer for Scene Graph Generation

Cong, Yuren; Yang, Michael Ying; Rosenhahn, Bodo

doi:10.1109/tpami.2023.3268066

articleIEEE Transactions on Pattern Analysis and Machine IntelligenceApr 19, 2023Closed access

RelTR: Relation Transformer for Scene Graph Generation

YCYuren Cong MYMichael Ying Yang BRBodo Rosenhahn

Leibniz University Hannover · University of Twente

PubMed

Indexed incrossrefpubmed

Abstract

Different objects in the same scene are more or less related to each other, but only a limited number of these relationships are noteworthy. Inspired by Detection Transformer, which excels in object detection, we view scene graph generation as a set prediction problem. In this article, we propose an end-to-end scene graph generation model Relation Transformer (RelTR), which has an encoder-decoder architecture. The encoder reasons about the visual feature context while the decoder infers a fixed-size set of triplets subject-predicate-object using different types of attention mechanisms with coupled subject and object queries. We design a set prediction loss performing the matching between the ground truth and…

Citation impact

205

total citations

FWCI: 23.21
Percentile: 100%
References: 105

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Scene graph
Artificial intelligence
Transformer
Inference
Encoder
Ground truth
Pattern recognition (psychology)

No related works found for this paper.