A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation
Chinese University of Hong Kong, Shenzhen · Wuhan University of Science and Technology
Abstract
Accurate semantic segmentation of remote sensing data plays a crucial role in the success of geoscience research and applications. Recently, multimodal fusion-based segmentation models have attracted much attention due to their outstanding performance as compared to conventional single-modal techniques. However, most of these models perform their fusion operation using convolutional neural networks (CNN) or the vision transformer (Vit), resulting in insufficient local-global contextual modeling and representative capabilities. In this work, a multilevel multimodal fusion scheme called FTransUNet is proposed to provide a robust and effective multimodal fusion backbone for semantic segmentation by integrating…
Citation impact
- FWCI
- 61.39
- Percentile
- 100%
- References
- 65
Authors
4Topics & keywords
- Computer science
- Segmentation
- Fusion
- Artificial intelligence
- Transformer
- Computer vision
- Remote sensing
- Image segmentation