articleIEEE AccessJan 1, 2025GOLD OA

EFFResNet-ViT: A Fusion-Based Convolutional and Vision Transformer Model for Explainable Medical Image Classification

University of Electro-Communications · University of Science and Technology of China · +4 more institutions

Indexed incrossrefdoaj

Abstract

The rapid advancement of medical imaging technologies requires the development of advanced, automated, and interpretable diagnostic tools for clinical decision-making. Although convolutional neural networks (CNNs) have shown significant promise in medical image analysis, they have limitations in capturing the global context and lack interpretability, thereby hindering their clinical adoption. This study presents EFFResNet-ViT, a novel hybrid deep learning (DL) model designed to address these challenges by combining EfficientNet-B0 and ResNet-50 CNN backbones with a vision transformer (ViT) module. The proposed architecture employs a feature fusion strategy to integrate the local feature extraction strengths of…

Citation impact

74
total citations
FWCI
82.83
Percentile
100%
References
54
Citations per year

Authors

9

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Computer vision
  • Fusion
  • Transformer
  • Pattern recognition (psychology)
  • Engineering
No related works found for this paper.