EfficientFormer: Vision Transformers at MobileNet Speed

Li, Yanyu; Yuan, Geng; Wen, Yang; Hu, Eric; Evangelidis, Georgios; Tulyakov, Sergey; Wang, Yanzhi; Ren, Jian

doi:10.48550/arxiv.2206.01191

preprintarXiv (Cornell University)Jun 2, 2022GREEN OA

EfficientFormer: Vision Transformers at MobileNet Speed

YLYanyu Li GYGeng Yuan YWYang Wen EHEric Hu GEGeorgios Evangelidis

Indexed inarxivdatacite

Abstract

Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks. However, due to the massive number of parameters and model design, \textit{e.g.}, attention mechanism, ViT-based models are generally times slower than lightweight convolutional networks. Therefore, the deployment of ViT for real-time applications is particularly challenging, especially on resource-constrained hardware such as mobile devices. Recent efforts try to reduce the computation complexity of ViT through network architecture search or hybrid design with MobileNet block, yet the inference speed is still unsatisfactory. This leads to an important question: can transformers run…

Citation impact

254

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Computation
Inference
Latency (audio)
Mobile device
Software deployment
Architecture
Transformer

No related works found for this paper.