articleJun 10, 2025Closed access

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Indexed incrossref

Abstract

We propose a novel hybrid Mamba-Transformer backbone, MambaVision, specifically tailored for vision applications. Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features. Through a comprehensive ablation study, we demonstrate the feasibility of integrating Vision Transformers (ViT) with Mamba. Our results show that equipping the Mamba architecture with self-attention blocks in the final layers greatly improves its capacity to capture longrange spatial dependencies. Based on these findings, we introduce a family of MambaVision models with a hierarchical architecture to meet various design criteria. For classification on the ImageNet-1K…

Citation impact

179
total citations
FWCI
178.37
Percentile
100%
References
0
Citations per year

Authors

2

Topics & keywords

Keywords
  • Transformer
  • Computer science
  • Electrical engineering
  • Engineering
  • Voltage
UN Sustainable Development Goals
  • Affordable and clean energy
No related works found for this paper.