Fusion-Mamba for Cross-Modality Object Detection

Dong, Wenhao; Zhu, Haodong; Lin, Shaohui; Luo, Xiaoyan; Shen, Yunhang; Guo, Guodong; Zhang, Baochang

doi:10.1109/tmm.2025.3599020

articleIEEE Transactions on MultimediaJan 1, 2025Closed access

Fusion-Mamba for Cross-Modality Object Detection

WDWenhao DongHZHaodong ZhuSLShaohui Lin XLXiaoyan Luo YSYunhang Shen

Beihang University · East China Normal University · +2 more institutions

Indexed incrossref

Abstract

Cross-modality object detection aims to fuse complementary information from different modalities to improve model performance, which achieves a wider range of applications. However, traditional cross-modality fusion methods, based on CNN or Transformer, inadequately address the issue of pseudo-target information, which causes model attention dispersion to degrade object detection performance. In this paper, we investigate a novel cross-modality fusion approach by associating cross-modal features in a hidden state space based on an improved Mamba with a gating attention mechanism. We propose the Fusion-Mamba Block(FMB), designed to map cross-modal features into a hidden state space for interaction, thereby…

Citation impact

49

total citations

FWCI: 49.21
Percentile: 100%
References: 78

Citations per year

Authors

7

WD
Wenhao DongCorresponding
Beihang University
HZ
Haodong Zhu
Beihang University
SL
Shaohui Lin
East China Normal University
XL
Xiaoyan Luo
Beihang University
YS
Yunhang Shen
Tencent (China)

Topics & keywords

Topics

Keywords

Computer science
Modality (human–computer interaction)
Object (grammar)
Artificial intelligence

No related works found for this paper.