A Survey of Visual Transformers

Liu, Yang; Zhang, Yao; Wang, Yixin; Hou, Feng; Yuan, Jin; Tian, Jiang; Zhang, Y.S.; Shi, Zhongchao; Fan, Jianping; He, Zhiqiang

doi:10.1109/tnnls.2022.3227717

articleIEEE Transactions on Neural Networks and Learning SystemsMar 30, 2023Closed access

A Survey of Visual Transformers

YLYang Liu YZYao Zhang YWYixin Wang FHFeng Hou JYJin Yuan

Chinese Academy of Sciences · Institute of Computing Technology · +5 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

Transformer, an attention-based encoder-decoder model, has already revolutionized the field of natural language processing (NLP). Inspired by such significant achievements, some pioneering works have recently been done on employing Transformer-liked architectures in the computer vision (CV) field, which have demonstrated their effectiveness on three fundamental CV tasks (classification, detection, and segmentation) as well as multiple sensory data stream (images, point clouds, and vision-language data). Because of their competitive modeling capabilities, the visual Transformers have achieved impressive performance improvements over multiple benchmarks as compared with modern convolution neural networks (CNNs).…

Citation impact

464

total citations

FWCI: 50.69
Percentile: 100%
References: 228

Citations per year

Authors

10

Topics & keywords

Topics

Keywords

Transformer
Computer science
Encoder
Segmentation
Artificial intelligence
Convolutional neural network
Engineering
Electrical engineering

No related works found for this paper.