articleIEEE Transactions on Geoscience and Remote SensingJan 1, 2023Closed access

Extended Vision Transformer (ExViT) for Land Use and Land Cover Classification: A Multimodal Deep Learning Framework

Chinese Academy of Sciences · Aerospace Information Research Institute · +5 more institutions

Indexed incrossref

Abstract

The recent success of attention mechanism-driven deep models, like Vision Transformer (ViT) as one of the most representative, has intrigued a wave of advanced research to explore their adaptation to broader domains. However, current Transformer-based approaches in the remote sensing (RS) community pay more attention to single-modality data, which might lose expandability in making full use of the ever-growing multimodal Earth observation data. To this end, we propose a novel multimodal deep learning framework by extending conventional ViT with minimal modifications, abbreviated as ExViT, aiming at the task of land use and land cover classification. Unlike common stems that adopt either linear patch projection…

No related works found for this paper.

Funding