articleOct 1, 2023Closed access

Rethinking Vision Transformers for MobileNet Size and Speed

Universidad del Noreste · Snap (United States) · +2 more institutions

Indexed incrossref

Abstract

With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices. Multiple approaches are proposed to accelerate attention mechanism, improve inefficient designs, or incorporate mobile-friendly lightweight convolutions to form hybrid architectures. However, ViT and its variants still have higher latency or considerably more parameters than lightweight CNNs, even true for the years-old MobileNet. In practice, latency and size are both crucial for efficient deployment on resource-constraint hardware. In this work, we investigate a central question, can transformer models run as fast as…

Citation impact

258
total citations
FWCI
29.40
Percentile
100%
References
108
Citations per year

Authors

8

Topics & keywords

Keywords
  • Computer science
  • Software deployment
  • Latency (audio)
  • Transformer
  • Computer engineering
  • Mobile device
  • Distributed computing
  • Real-time computing
No related works found for this paper.

Funding