Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation
Ministry of Education of the People's Republic of China · China University of Mining and Technology · +1 more institution
Abstract
Global context information is essential for the semantic segmentation of remote sensing (RS) images. However, most existing methods rely on a convolutional neural network (CNN), which is challenging to directly obtain the global context due to the locality of the convolution operation. Inspired by the Swin transformer with powerful global modeling capabilities, we propose a novel semantic segmentation framework for RS images called ST-U-shaped network (UNet), which embeds the Swin transformer into the classical CNN-based UNet. ST-UNet constitutes a novel dual encoder structure of the Swin transformer and CNN in parallel. First, we propose a spatial interaction module (SIM), which encodes spatial information in…
Citation impact
- FWCI
- 51.30
- Percentile
- 100%
- References
- 82
Authors
6- XHXin HeCorresponding
Ministry of Education of the People's Republic of China, China University of Mining and Technology
- YZYong Zhou
Ministry of Education of the People's Republic of China, China University of Mining and Technology
- JZJiaqi Zhao
Ministry of Education of the People's Republic of China, China University of Mining and Technology
- DZDi Zhang
Ministry of Education of the People's Republic of China, China University of Mining and Technology
- RYRui Yao
Ministry of Education of the People's Republic of China, China University of Mining and Technology
Topics & keywords
- Computer science
- Encoder
- Transformer
- Artificial intelligence
- Segmentation
- Upsampling
- Embedding
- Convolutional neural network