TransFG: A Transformer Architecture for Fine-Grained Recognition

He, Ju; Chen, Jie-Neng; Liu, Shuai; Kortylewski, Adam; Yang, Cheng; Bai, Yutong; Wang, Changhu

doi:10.1609/aaai.v36i1.19967

articleProceedings of the AAAI Conference on Artificial IntelligenceJun 28, 2022DIAMOND OA

TransFG: A Transformer Architecture for Fine-Grained Recognition

JHJu He JCJie-Neng Chen SLShuai Liu AKAdam Kortylewski CYCheng Yang

Johns Hopkins University · Max Planck Institute for Informatics

Indexed incrossref

Abstract

Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences. Most existing works mainly tackle this problem by reusing the backbone network to extract features of detected discriminative regions. However, this strategy inevitably complicates the pipeline and pushes the proposed regions to contain most parts of the objects thus fails to locate the really important parts. Recently, vision transformer (ViT) shows its strong performance in the traditional classification task. The self-attention mechanism of the transformer links every patch token to the classification token. In this work, we first…

Citation impact

474

total citations

FWCI: 25.85
Percentile: 100%
References: 57

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Transformer
Computer science
Discriminative model
Security token
Artificial intelligence
Locality
Reuse
Machine learning

UN Sustainable Development Goals

Reduced inequalities

No related works found for this paper.