articleJun 1, 2023Closed access

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

Chinese University of Hong Kong · Microsoft Research (United Kingdom)

Indexed incrossref

Abstract

Vision transformers have shown great success due to their high model capabilities. However, their remarkable performance is accompanied by heavy computation costs, which makes them unsuitable for real-time applications. In this paper, we propose a family of high-speed vision transformers named Efficient ViT. We find that the speed of existing transformer models is commonly bounded by memory inefficient operations, especially the tensor reshaping and element-wise functions in MHSA. Therefore, we design a new building block with a sandwich layout, i.e., using a single memory-bound MHSA between efficient FFN layers, which improves memory efficiency while enhancing channel communication. Moreover, we discover that…

Citation impact

719
total citations
FWCI
89.36
Percentile
100%
References
116
Citations per year

Authors

6

Topics & keywords

Keywords
  • Computer science
  • Computation
  • Xeon
  • Parallel computing
  • Speedup
  • Redundancy (engineering)
  • Transformer
  • Application-specific integrated circuit
No related works found for this paper.