articlearXiv (Cornell University)Dec 23, 2020GREEN OA

Training data-efficient image transformers & distillation through attention

Indexed inarxiv

Abstract

Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. However, these visual transformers are pre-trained with hundreds of millions of images using an expensive infrastructure, thereby limiting their adoption. In this work, we produce a competitive convolution-free transformer by training on Imagenet only. We train them on a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop evaluation) on ImageNet with no external data. More importantly, we introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the…

Citation impact

1,049
total citations
FWCI
59.76
Percentile
100%
References
61
Citations per year

Authors

6

Topics & keywords

Keywords
  • Transformer
  • Computer science
  • Distillation
  • Limiting
  • Artificial intelligence
  • Artificial neural network
  • Machine learning
  • Security token
UN Sustainable Development Goals
  • Industry, innovation and infrastructure
No related works found for this paper.