CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection
Dalian University of Technology
Indexed incrossrefpubmed
Abstract
Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize the convolution operation and construct complex interweave fusion structures to achieve cross-modal information integration. The inherent local connectivity of the convolution operation constrains the performance of the convolution-based methods to a ceiling. In this work, we rethink these tasks from the perspective of global information alignment and transformation. Specifically, the proposed cross-modal view-mixed transformer (CAVER) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path. CAVER treats the multi-scale and multi-modal feature integration as a…
Citation impact
177
total citations
- FWCI
- 20.10
- Percentile
- 100%
- References
- 96
Citations per year
Authors
4Topics & keywords
Topics
Keywords
- Computer science
- Modal
- Algorithm
- Artificial intelligence
- Encoder
- RGB color model
- Computer vision
- Theoretical computer science
No related works found for this paper.