SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design

Yun, Seokju; Ro, Youngmin

doi:10.1109/cvpr52733.2024.00550

articleJun 16, 2024Closed access

SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design

SYSeokju Yun YRYoungmin Ro

University of Seoul

Indexed incrossref

Abstract

Recently, efficient Vision Transformers have shown great performance with low latency on resource-constrained devices. Conventionally, they use $4\times 4$ patch embeddings and a 4-stage structure at the macro level, while utilizing sophisticated attention with multi-head configuration at the micro level. This paper aims to address computational redundancy at all design levels in a memory-efficient manner. We discover that using larger-stride patchify stem not only reduces memory access costs but also achieves competitive performance by leveraging token representations with reduced spatial redundancy from the early stages. Furthermore, our preliminary analyses suggest that attention layers in the early stages…

Citation impact

122

total citations

FWCI: 23.24
Percentile: 100%
References: 88

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Computer science
Transformer
Macro
Computer hardware
Electrical engineering
Engineering
Voltage
Programming language

UN Sustainable Development Goals

Affordable and clean energy

No related works found for this paper.

Funding

NR
National Research Foundation of Korea
Award: RS-2022-00166109