EFFResNet-ViT: A Fusion-Based Convolutional and Vision Transformer Model for Explainable Medical Image Classification
University of Electro-Communications · University of Science and Technology of China · +4 more institutions
Abstract
The rapid advancement of medical imaging technologies requires the development of advanced, automated, and interpretable diagnostic tools for clinical decision-making. Although convolutional neural networks (CNNs) have shown significant promise in medical image analysis, they have limitations in capturing the global context and lack interpretability, thereby hindering their clinical adoption. This study presents EFFResNet-ViT, a novel hybrid deep learning (DL) model designed to address these challenges by combining EfficientNet-B0 and ResNet-50 CNN backbones with a vision transformer (ViT) module. The proposed architecture employs a feature fusion strategy to integrate the local feature extraction strengths of…
Citation impact
- FWCI
- 82.83
- Percentile
- 100%
- References
- 54
Authors
9- THTahir HussainCorresponding
University of Electro-Communications
- HSHayaru Shouno
University of Electro-Communications
- AHAbid Hussain
University of Science and Technology of China
- DHDostdar Hussain
Karakoram International University
- MIMuhammad Ismail
Karakoram International University
Topics & keywords
- Computer science
- Artificial intelligence
- Computer vision
- Fusion
- Transformer
- Pattern recognition (psychology)
- Engineering