preprintarXiv (Cornell University)Jun 2, 2022GREEN OA

EfficientFormer: Vision Transformers at MobileNet Speed

Indexed inarxivdatacite

Abstract

Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks. However, due to the massive number of parameters and model design, \textit{e.g.}, attention mechanism, ViT-based models are generally times slower than lightweight convolutional networks. Therefore, the deployment of ViT for real-time applications is particularly challenging, especially on resource-constrained hardware such as mobile devices. Recent efforts try to reduce the computation complexity of ViT through network architecture search or hybrid design with MobileNet block, yet the inference speed is still unsatisfactory. This leads to an important question: can transformers run…

Citation impact

254
total citations
FWCI
Percentile
References
0
Citations per year

Authors

8

Topics & keywords

Keywords
  • Computer science
  • Computation
  • Inference
  • Latency (audio)
  • Mobile device
  • Software deployment
  • Architecture
  • Transformer
No related works found for this paper.