articleJun 10, 2025Closed access
MambaOut: Do We Really Need Mamba for Vision?
National University of Singapore
Indexed incrossref
Abstract
Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism and subsequently applied to vision tasks. Nevertheless, the performance of Mamba for vision is often underwhelming when compared with convolutional and attention-based models. In this paper, we delve into the essence of Mamba, and conceptually conclude that Mamba is ideally suited for tasks with long-sequence and autoregressive characteristics. For vision tasks, as image classification on ImageNet does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks on COCO or ADE20K…
Citation impact
119
total citations
- FWCI
- 812.13
- Percentile
- 100%
- References
- 53
Citations per year
Authors
2Topics & keywords
Keywords
- Computer science
- Computer vision
No related works found for this paper.