article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access
A-ViT: Adaptive Tokens for Efficient Vision Transformer
Indexed incrossref
Abstract
We introduce A - ViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity. A - ViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds. We refor-mulate Adaptive Computation Time (ACT [17]) for this task, extending halting to discard redundant spatial tokens. The appealing architectural properties of vision transformers enables our adaptive token reduction mechanism to speed up inference without modifying the network architecture or inference hardware. We demonstrate that A - ViT requires no extra parameters or sub-network for halting, as we base the learning of…
Citation impact
295
total citations
- FWCI
- 16.11
- Percentile
- 100%
- References
- 83
Citations per year
Authors
6Topics & keywords
Topics
Keywords
- Computer science
- Inference
- Transformer
- Security token
- Artificial intelligence
- Rendering (computer graphics)
- Computation
- Regularization (linguistics)
No related works found for this paper.