EFFResNet-ViT: A Fusion-Based Convolutional and Vision Transformer Model for Explainable Medical Image Classification

Hussain, Tahir; Shouno, Hayaru; Hussain, Abid; Hussain, Dostdar; Ismail, Muhammad; Mir, Tatheer Hussain; Hsu, Fang-Rong; Alam, Taukir; Akhy, Shabnur Anonna

doi:10.1109/access.2025.3554184

articleIEEE AccessJan 1, 2025GOLD OA

EFFResNet-ViT: A Fusion-Based Convolutional and Vision Transformer Model for Explainable Medical Image Classification

THTahir Hussain HSHayaru ShounoAHAbid HussainDHDostdar Hussain MIMuhammad Ismail

University of Electro-Communications · University of Science and Technology of China · +4 more institutions

Indexed incrossrefdoaj

Abstract

The rapid advancement of medical imaging technologies requires the development of advanced, automated, and interpretable diagnostic tools for clinical decision-making. Although convolutional neural networks (CNNs) have shown significant promise in medical image analysis, they have limitations in capturing the global context and lack interpretability, thereby hindering their clinical adoption. This study presents EFFResNet-ViT, a novel hybrid deep learning (DL) model designed to address these challenges by combining EfficientNet-B0 and ResNet-50 CNN backbones with a vision transformer (ViT) module. The proposed architecture employs a feature fusion strategy to integrate the local feature extraction strengths of…

Citation impact

74

total citations

FWCI: 82.83
Percentile: 100%
References: 54

Citations per year

Authors

9

TH
Tahir HussainCorresponding
University of Electro-Communications
HS
Hayaru Shouno
University of Electro-Communications
AH
Abid Hussain
University of Science and Technology of China
DH
Dostdar Hussain
Karakoram International University
MI
Muhammad Ismail
Karakoram International University

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Computer vision
Fusion
Transformer
Pattern recognition (psychology)
Engineering

No related works found for this paper.