Vision Transformers for Image Classification: A Comparative Survey
Taiyuan University of Technology · Nanyang Technological University · +2 more institutions
Abstract
Transformers were initially introduced for natural language processing, leveraging the self-attention mechanism. They require minimal inductive biases in their design and can function effectively as set-based architectures. Additionally, transformers excel at capturing long-range dependencies and enabling parallel processing, which allows them to outperform traditional models, such as long short-term memory (LSTM) networks, on sequence-based tasks. In recent years, transformers have been widely adopted in computer vision, driving remarkable advancements in the field. Previous surveys have provided overviews of transformer applications across various computer vision tasks, such as object detection, activity…
Citation impact
- FWCI
- 108.34
- Percentile
- 100%
- References
- 74
Authors
5Topics & keywords
- Transformer
- Computer science
- Convolutional neural network
- Artificial intelligence
- Image processing
- Machine learning
- Pattern recognition (psychology)
- Engineering
- Quality Education