A-ViT: Adaptive Tokens for Efficient Vision Transformer

Yin, Hongxu; Vahdat, Arash; Alvarez, José M.; Mallya, Arun; Kautz, Jan; Molchanov, Pavlo

doi:10.1109/cvpr52688.2022.01054

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

A-ViT: Adaptive Tokens for Efficient Vision Transformer

HYHongxu Yin AVArash Vahdat JMJosé M. Alvarez AMArun Mallya JKJan Kautz

Indexed incrossref

Abstract

We introduce A - ViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity. A - ViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds. We refor-mulate Adaptive Computation Time (ACT [17]) for this task, extending halting to discard redundant spatial tokens. The appealing architectural properties of vision transformers enables our adaptive token reduction mechanism to speed up inference without modifying the network architecture or inference hardware. We demonstrate that A - ViT requires no extra parameters or sub-network for halting, as we base the learning of…

Citation impact

295

total citations

FWCI: 16.11
Percentile: 100%
References: 83

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Inference
Transformer
Security token
Artificial intelligence
Rendering (computer graphics)
Computation
Regularization (linguistics)

No related works found for this paper.