TransFG: A Transformer Architecture for Fine-Grained Recognition
Johns Hopkins University · Max Planck Institute for Informatics
Abstract
Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences. Most existing works mainly tackle this problem by reusing the backbone network to extract features of detected discriminative regions. However, this strategy inevitably complicates the pipeline and pushes the proposed regions to contain most parts of the objects thus fails to locate the really important parts. Recently, vision transformer (ViT) shows its strong performance in the traditional classification task. The self-attention mechanism of the transformer links every patch token to the classification token. In this work, we first…
Citation impact
- FWCI
- 25.85
- Percentile
- 100%
- References
- 57
Authors
7Topics & keywords
- Transformer
- Computer science
- Discriminative model
- Security token
- Artificial intelligence
- Locality
- Reuse
- Machine learning
- Reduced inequalities