AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

Chen, Shoufa; Ge, Chongjian; Zhan, Tong; Wang, Jiangliu; Song, Yibing; Wang, Jue; Luo, Ping

doi:10.48550/arxiv.2205.13535

preprintarXiv (Cornell University)May 26, 2022GREEN OA

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

SCShoufa Chen CGChongjian Ge TZTong Zhan JWJiangliu Wang YSYibing Song

Indexed inarxivdatacite

Abstract

Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and memory storage. Each model needs an independent and complete finetuning process to adapt to different tasks, which limits its transferability to different visual domains. To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently. It possesses several benefits more appealing than prior arts. Firstly, AdaptFormer introduces lightweight…

Citation impact

262

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Transformer
Scalability
Transferability
Artificial intelligence
Computation
Action recognition
Machine learning

No related works found for this paper.