VMamba: Visual State Space Model

Liu, Yue; Tian, Yunjie; Zhao, Yuzhong; Yu, Hongtian; Xie, Lingxi; Wang, Yaowei; Ye, Qixiang; Jianbin, Jiao,; Yunfan, Liu,

doi:10.48550/arxiv.2401.10166

preprintarXiv (Cornell University)Jan 18, 2024GREEN OA

VMamba: Visual State Space Model

YLYue Liu YTYunjie Tian YZYuzhong Zhao HYHongtian Yu LXLingxi Xie

Indexed inarxivdatacite

Abstract

Designing computationally efficient network architectures remains an ongoing necessity in computer vision. In this paper, we adapt Mamba, a state-space language model, into VMamba, a vision backbone with linear time complexity. At the core of VMamba is a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module. By traversing along four scanning routes, SS2D bridges the gap between the ordered nature of 1D selective scan and the non-sequential structure of 2D vision data, which facilitates the collection of contextual information from various sources and perspectives. Based on the VSS blocks, we develop a family of VMamba architectures and accelerate them through a succession of…

Citation impact

361

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Computer science
Scalability
Traverse
Artificial intelligence
Visual space
Convolutional neural network
Computational complexity theory
Receptive field

No related works found for this paper.