SCMF-Net: Sparse Self-Attention Driven Cross-Modal Fusion for Robust Detection in Complex Road Scenes

He, Yunze; Hao, Yousheng; Qian, Mengying; Deng, Baoyuan; Zhang, Lilian; Cheng, Liang; Wang, Yaonan

doi:10.1109/jsen.2026.3664865

articleIEEE Sensors JournalFeb 23, 2026Closed access

SCMF-Net: Sparse Self-Attention Driven Cross-Modal Fusion for Robust Detection in Complex Road Scenes

YHYunze He YHYousheng Hao MQMengying Qian BDBaoyuan DengLZLilian Zhang

Hunan University · Centre for Artificial Intelligence and Robotics · +2 more institutions

Indexed incrossref

Abstract

This paper introduces SCMF-Net (Sparse Cross-Modal Fusion Network), a lightweight multimodal perception framework designed to enhance representation quality and inference efficiency while minimizing computational overhead. To address the sparsity and irregular distribution of LiDAR point clouds, an intensity-aware depth encoding strategy is proposed to enhance the structural cues in the depth modality. Additionally, a dual-branch backbone is employed to further strengthen feature extraction. Building upon this, FFLSA (Feature Fusion Local Self-Attention) is introduced to enable efficient cross-modal fusion. FFLSA leverages Self-Attention Clustering (SAC) to identify salient cross-modal regions, and…

Citation impact

4

total citations

FWCI: 109.17
Percentile: 100%
References: 0

Too recent for citation history.

Authors

7

YH
Yunze HeCorresponding
Hunan University
YH
Yousheng Hao
Hunan University
MQ
Mengying Qian
Hunan University
BD
Baoyuan Deng
Centre for Artificial Intelligence and Robotics
LZ
Lilian Zhang
National University of Defense Technology

Topics & keywords

Topics

Keywords

Benchmark (surveying)
Fusion
Feature (linguistics)
Cluster analysis
Encoding (memory)
Sensor fusion
Inference
Representation (politics)

No related works found for this paper.