articleIEEE Transactions on Geoscience and Remote SensingJan 1, 2024Closed access

EarthGPT: A Universal Multimodal Large Language Model for Multisensor Image Comprehension in Remote Sensing Domain

Beijing Institute of Technology

Indexed incrossref

Abstract

Multi-modal large language models (MLLMs) have demonstrated remarkable success in vision and visual-language tasks within the natural image domain. Owing to the significant domain gap between natural and remote sensing (RS) images, the development of MLLMs in the RS domain is still in the infant stage. To fill the gap, a pioneer MLLM named EarthGPT integrating various multi-sensor RS interpretation tasks uniformly is proposed in this paper for universal RS image comprehension. Firstly, a visual-enhanced perception mechanism is constructed to refine and incorporate coarse-scale semantic perception information and fine-scale detailed perception information. Secondly, a cross-modal mutual comprehension approach…

Citation impact

131
total citations
FWCI
29.61
Percentile
100%
References
131
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Modal
  • Remote sensing
  • Image sensor
  • Domain (mathematical analysis)
  • Image (mathematics)
  • Comprehension
  • Artificial intelligence
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding