A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation

Ma, Xianping; Zhang, Xiaokang; Pun, Man-On; Liu, Ming

doi:10.1109/tgrs.2024.3373033

articleIEEE Transactions on Geoscience and Remote SensingJan 1, 2024Closed access

A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation

XMXianping Ma XZXiaokang Zhang MPMan-On Pun MLMing Liu

Chinese University of Hong Kong, Shenzhen · Wuhan University of Science and Technology

Indexed incrossref

Abstract

Accurate semantic segmentation of remote sensing data plays a crucial role in the success of geoscience research and applications. Recently, multimodal fusion-based segmentation models have attracted much attention due to their outstanding performance as compared to conventional single-modal techniques. However, most of these models perform their fusion operation using convolutional neural networks (CNN) or the vision transformer (Vit), resulting in insufficient local-global contextual modeling and representative capabilities. In this work, a multilevel multimodal fusion scheme called FTransUNet is proposed to provide a robust and effective multimodal fusion backbone for semantic segmentation by integrating…

Citation impact

208

total citations

FWCI: 61.39
Percentile: 100%
References: 65

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Segmentation
Fusion
Artificial intelligence
Transformer
Computer vision
Remote sensing
Image segmentation

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China
Awards: 42371374, 41801323