Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review

Maurício, José; Domingues, Inês; Bernardino, Jorge

doi:10.3390/app13095521

reviewApplied SciencesApr 28, 2023GOLD OA

Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review

JMJosé Maurício IDInês Domingues JBJorge Bernardino

Polytechnic Institute of Coimbra

Indexed incrossrefdoaj

Abstract

Transformers are models that implement a mechanism of self-attention, individually weighting the importance of each part of the input data. Their use in image classification tasks is still somewhat limited since researchers have so far chosen Convolutional Neural Networks for image classification and transformers were more targeted to Natural Language Processing (NLP) tasks. Therefore, this paper presents a literature review that shows the differences between Vision Transformers (ViT) and Convolutional Neural Networks. The state of the art that used the two architectures for image classification was reviewed and an attempt was made to understand what factors may influence the performance of the two deep…

Citation impact

521

total citations

FWCI: 115.80
Percentile: 100%
References: 22

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Convolutional neural network
Artificial intelligence
Contextual image classification
Transformer
Weighting
Machine learning
Pattern recognition (psychology)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.