articleIEEE Transactions on Geoscience and Remote SensingJan 1, 2023Closed access

Multimodal Fusion Transformer for Remote Sensing Image Classification

Technical University of Munich · Chinese Academy of Sciences · +7 more institutions

Indexed incrossref

Abstract

Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViTs in hyperspectral image (HSI) classification tasks. To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters. ViTs and other similar transformers use an external classification (CLS) token which is randomly initialized and often fails to generalize well, whereas other sources of multimodal datasets, such as light detection and ranging (LiDAR) offer the potential to improve these models by means of a CLS. In this paper, we introduce a new multimodal…

Citation impact

455
total citations
FWCI
67.30
Percentile
100%
References
74
Citations per year

Authors

6

Topics & keywords

Keywords
  • Computer science
  • Transformer
  • Artificial intelligence
  • Convolutional neural network
  • CLs upper limits
  • Encoder
  • Contextual image classification
  • Pattern recognition (psychology)
UN Sustainable Development Goals
  • Life below water
No related works found for this paper.

Funding