CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection

Pang, Youwei; Zhao, Xiaoqi; Zhang, Lihe; Lu, Huchuan

doi:10.1109/tip.2023.3234702

articleIEEE Transactions on Image ProcessingJan 1, 2023Closed access

CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection

YPYouwei Pang XZXiaoqi Zhao LZLihe Zhang HLHuchuan Lu

Dalian University of Technology

PubMed

Indexed incrossrefpubmed

Abstract

Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize the convolution operation and construct complex interweave fusion structures to achieve cross-modal information integration. The inherent local connectivity of the convolution operation constrains the performance of the convolution-based methods to a ceiling. In this work, we rethink these tasks from the perspective of global information alignment and transformation. Specifically, the proposed cross-modal view-mixed transformer (CAVER) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path. CAVER treats the multi-scale and multi-modal feature integration as a…

Citation impact

177

total citations

FWCI: 20.10
Percentile: 100%
References: 96

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Modal
Algorithm
Artificial intelligence
Encoder
RGB color model
Computer vision
Theoretical computer science

No related works found for this paper.