Rethinking Vision Transformers for MobileNet Size and Speed

Li, Yanyu; Ju, Hu; Wen, Yang; Evangelidis, Georgios; Salahi, Kamyar; Wang, Yanzhi; Tulyakov, Sergey; Ren, Jian

doi:10.1109/iccv51070.2023.01549

articleOct 1, 2023Closed access

Rethinking Vision Transformers for MobileNet Size and Speed

YLYanyu Li HJHu Ju YWYang Wen GEGeorgios Evangelidis KSKamyar Salahi

Universidad del Noreste · Snap (United States) · +2 more institutions

Indexed incrossref

Abstract

With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices. Multiple approaches are proposed to accelerate attention mechanism, improve inefficient designs, or incorporate mobile-friendly lightweight convolutions to form hybrid architectures. However, ViT and its variants still have higher latency or considerably more parameters than lightweight CNNs, even true for the years-old MobileNet. In practice, latency and size are both crucial for efficient deployment on resource-constraint hardware. In this work, we investigate a central question, can transformer models run as fast as…

Citation impact

258

total citations

FWCI: 29.40
Percentile: 100%
References: 108

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Software deployment
Latency (audio)
Transformer
Computer engineering
Mobile device
Distributed computing
Real-time computing

No related works found for this paper.