A Survey of Visual Transformers
Chinese Academy of Sciences · Institute of Computing Technology · +5 more institutions
Abstract
Transformer, an attention-based encoder-decoder model, has already revolutionized the field of natural language processing (NLP). Inspired by such significant achievements, some pioneering works have recently been done on employing Transformer-liked architectures in the computer vision (CV) field, which have demonstrated their effectiveness on three fundamental CV tasks (classification, detection, and segmentation) as well as multiple sensory data stream (images, point clouds, and vision-language data). Because of their competitive modeling capabilities, the visual Transformers have achieved impressive performance improvements over multiple benchmarks as compared with modern convolution neural networks (CNNs).…
Citation impact
- FWCI
- 50.69
- Percentile
- 100%
- References
- 228
Authors
10- YLYang LiuCorresponding
Chinese Academy of Sciences, Institute of Computing Technology, University of Chinese Academy of Sciences
- YZYao Zhang
Chinese Academy of Sciences, Institute of Computing Technology, Lenovo (China), University of Chinese Academy of Sciences
- YWYixin Wang
Palo Alto University, Stanford University
- FHFeng Hou
Chinese Academy of Sciences, Institute of Computing Technology, University of Chinese Academy of Sciences
- JYJin Yuan
Southeast University
Topics & keywords
- Transformer
- Computer science
- Encoder
- Segmentation
- Artificial intelligence
- Convolutional neural network
- Engineering
- Electrical engineering