articleIEEE Sensors JournalFeb 23, 2026Closed access

SCMF-Net: Sparse Self-Attention Driven Cross-Modal Fusion for Robust Detection in Complex Road Scenes

Hunan University · Centre for Artificial Intelligence and Robotics · +2 more institutions

Indexed incrossref

Abstract

This paper introduces SCMF-Net (Sparse Cross-Modal Fusion Network), a lightweight multimodal perception framework designed to enhance representation quality and inference efficiency while minimizing computational overhead. To address the sparsity and irregular distribution of LiDAR point clouds, an intensity-aware depth encoding strategy is proposed to enhance the structural cues in the depth modality. Additionally, a dual-branch backbone is employed to further strengthen feature extraction. Building upon this, FFLSA (Feature Fusion Local Self-Attention) is introduced to enable efficient cross-modal fusion. FFLSA leverages Self-Attention Clustering (SAC) to identify salient cross-modal regions, and…

Citation impact

4
total citations
FWCI
109.17
Percentile
100%
References
0
Too recent for citation history.

Authors

7

Topics & keywords

Keywords
  • Benchmark (surveying)
  • Fusion
  • Feature (linguistics)
  • Cluster analysis
  • Encoding (memory)
  • Sensor fusion
  • Inference
  • Representation (politics)
No related works found for this paper.

Funding