CMTFNet: CNN and Multiscale Transformer Fusion Network for Remote-Sensing Image Semantic Segmentation
Changsha University of Science and Technology
Abstract
Convolutional neural networks (CNNs) are powerful in extracting local information but lack the ability to model long-range dependencies. In contrast, transformer relies on multihead self-attention mechanisms to effectively extract the global contextual information and thus model long-range dependencies. In this paper, we propose a novel encoder-decoder structured semantic segmentation network, named as CNN and multiscale transformer fusion network (CMTFNet), to extract and fuse local information and multiscale global contextual information of high-resolution remote sensing images. Specifically, to further process the output features from the CNN encoder, we build a transformer decoder based on the multiscale…
Citation impact
- FWCI
- 45.60
- Percentile
- 100%
- References
- 58
Authors
5Topics & keywords
- Computer science
- Encoder
- Artificial intelligence
- Convolutional neural network
- Segmentation
- Transformer
- Pattern recognition (psychology)
- Computer vision