RelTR: Relation Transformer for Scene Graph Generation

Leibniz University Hannover · University of Twente

PubMed
Indexed incrossrefpubmed

Abstract

Different objects in the same scene are more or less related to each other, but only a limited number of these relationships are noteworthy. Inspired by Detection Transformer, which excels in object detection, we view scene graph generation as a set prediction problem. In this article, we propose an end-to-end scene graph generation model Relation Transformer (RelTR), which has an encoder-decoder architecture. The encoder reasons about the visual feature context while the decoder infers a fixed-size set of triplets subject-predicate-object using different types of attention mechanisms with coupled subject and object queries. We design a set prediction loss performing the matching between the ground truth and…

Citation impact

205
total citations
FWCI
23.21
Percentile
100%
References
105
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Scene graph
  • Artificial intelligence
  • Transformer
  • Inference
  • Encoder
  • Ground truth
  • Pattern recognition (psychology)
No related works found for this paper.

Funding