TransFG: A Transformer Architecture for Fine-Grained Recognition

Johns Hopkins University · Max Planck Institute for Informatics

Indexed incrossref

Abstract

Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences. Most existing works mainly tackle this problem by reusing the backbone network to extract features of detected discriminative regions. However, this strategy inevitably complicates the pipeline and pushes the proposed regions to contain most parts of the objects thus fails to locate the really important parts. Recently, vision transformer (ViT) shows its strong performance in the traditional classification task. The self-attention mechanism of the transformer links every patch token to the classification token. In this work, we first…

Citation impact

474
total citations
FWCI
25.85
Percentile
100%
References
57
Citations per year

Authors

7

Topics & keywords

Keywords
  • Transformer
  • Computer science
  • Discriminative model
  • Security token
  • Artificial intelligence
  • Locality
  • Reuse
  • Machine learning
UN Sustainable Development Goals
  • Reduced inequalities
No related works found for this paper.