MambaOut: Do We Really Need Mamba for Vision?

Yu, Weihao; Wang, Xinchao

doi:10.1109/cvpr52734.2025.00423

articleJun 10, 2025Closed access

MambaOut: Do We Really Need Mamba for Vision?

WYWeihao Yu XWXinchao Wang

National University of Singapore

Indexed incrossref

Abstract

Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism and subsequently applied to vision tasks. Nevertheless, the performance of Mamba for vision is often underwhelming when compared with convolutional and attention-based models. In this paper, we delve into the essence of Mamba, and conceptually conclude that Mamba is ideally suited for tasks with long-sequence and autoregressive characteristics. For vision tasks, as image classification on ImageNet does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks on COCO or ADE20K…

Citation impact

119

total citations

FWCI: 812.13
Percentile: 100%
References: 53

Citations per year

Authors

2

Topics & keywords

Topics

Education and Technology Integration22%

Keywords

Computer science
Computer vision

No related works found for this paper.

Funding

MO
Ministry of Education