Multimodal Fusion Transformer for Remote Sensing Image Classification

Roy, Swalpa Kumar; Deria, Ankur; Hong, Danfeng; Rasti, Behnood; Plaza, Antonio; Chanussot, Jocelyn

doi:10.1109/tgrs.2023.3286826

articleIEEE Transactions on Geoscience and Remote SensingJan 1, 2023Closed access

Multimodal Fusion Transformer for Remote Sensing Image Classification

SKSwalpa Kumar Roy ADAnkur Deria DHDanfeng Hong BRBehnood Rasti APAntonio Plaza

Technical University of Munich · Chinese Academy of Sciences · +7 more institutions

Indexed incrossref

Abstract

Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViTs in hyperspectral image (HSI) classification tasks. To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters. ViTs and other similar transformers use an external classification (CLS) token which is randomly initialized and often fails to generalize well, whereas other sources of multimodal datasets, such as light detection and ranging (LiDAR) offer the potential to improve these models by means of a CLS. In this paper, we introduce a new multimodal…

Citation impact

455

total citations

FWCI: 67.30
Percentile: 100%
References: 74

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Transformer
Artificial intelligence
Convolutional neural network
CLs upper limits
Encoder
Contextual image classification
Pattern recognition (psychology)

UN Sustainable Development Goals

Life below water

No related works found for this paper.