SCMF-Net: Sparse Self-Attention Driven Cross-Modal Fusion for Robust Detection in Complex Road Scenes
Hunan University · Centre for Artificial Intelligence and Robotics · +2 more institutions
Abstract
This paper introduces SCMF-Net (Sparse Cross-Modal Fusion Network), a lightweight multimodal perception framework designed to enhance representation quality and inference efficiency while minimizing computational overhead. To address the sparsity and irregular distribution of LiDAR point clouds, an intensity-aware depth encoding strategy is proposed to enhance the structural cues in the depth modality. Additionally, a dual-branch backbone is employed to further strengthen feature extraction. Building upon this, FFLSA (Feature Fusion Local Self-Attention) is introduced to enable efficient cross-modal fusion. FFLSA leverages Self-Attention Clustering (SAC) to identify salient cross-modal regions, and…
Citation impact
- FWCI
- 109.17
- Percentile
- 100%
- References
- 0
Authors
7- YHYunze HeCorresponding
Hunan University
- YHYousheng Hao
Hunan University
- MQMengying Qian
Hunan University
- BDBaoyuan Deng
Centre for Artificial Intelligence and Robotics
- LZLilian Zhang
National University of Defense Technology
Topics & keywords
- Benchmark (surveying)
- Fusion
- Feature (linguistics)
- Cluster analysis
- Encoding (memory)
- Sensor fusion
- Inference
- Representation (politics)