Multimodal Fusion Transformer for Remote Sensing Image Classification
Technical University of Munich · Chinese Academy of Sciences · +7 more institutions
Abstract
Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViTs in hyperspectral image (HSI) classification tasks. To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters. ViTs and other similar transformers use an external classification (CLS) token which is randomly initialized and often fails to generalize well, whereas other sources of multimodal datasets, such as light detection and ranging (LiDAR) offer the potential to improve these models by means of a CLS. In this paper, we introduce a new multimodal…
Citation impact
- FWCI
- 67.30
- Percentile
- 100%
- References
- 74
Authors
6- SKSwalpa Kumar RoyCorresponding
- ADAnkur Deria
Technical University of Munich
- DHDanfeng Hong
Chinese Academy of Sciences, Aerospace Information Research Institute
- BRBehnood Rasti
Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology
- APAntonio Plaza
Universidad de Extremadura
Topics & keywords
- Computer science
- Transformer
- Artificial intelligence
- Convolutional neural network
- CLs upper limits
- Encoder
- Contextual image classification
- Pattern recognition (psychology)
- Life below water