SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

Shaker, Abdelrahman; Maaz, Muhammad; Rasheed, Hanoona; Khan, Salman; Yang, Ming–Hsuan; Khan, Fahad Shahbaz

doi:10.1109/iccv51070.2023.01598

articleOct 1, 2023Closed access

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

ASAbdelrahman Shaker MMMuhammad Maaz HRHanoona Rasheed SKSalman Khan MYMing–Hsuan Yang

Mohamed bin Zayed University of Artificial Intelligence · University of California, Merced · +2 more institutions

Indexed incrossref

Abstract

Self-attention has become a defacto choice for capturing global context in various vision applications. However, its quadratic computational complexity with respect to image resolution limits its use in real-time applications, especially for deployment on resource-constrained mobile devices. Although hybrid approaches have been proposed to combine the advantages of convolutions and self-attention for a better speed-accuracy trade-off, the expensive matrix multiplication operations in self-attention remain a bottleneck. In this work, we introduce a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations with linear element-wise multiplications. Our…

Citation impact

229

total citations

FWCI: 26.12
Percentile: 100%
References: 0

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Bottleneck
Matrix multiplication
Latency (audio)
Mobile device
Computer engineering
Inference
Multiplication (music)

UN Sustainable Development Goals

Decent work and economic growth

No related works found for this paper.