EarthGPT: A Universal Multimodal Large Language Model for Multisensor Image Comprehension in Remote Sensing Domain

Zhang, Wei; Cai, Miaoxin; Zhang, Tong; Zhuang, Yin; Mao, Xuerui

doi:10.1109/tgrs.2024.3409624

articleIEEE Transactions on Geoscience and Remote SensingJan 1, 2024Closed access

EarthGPT: A Universal Multimodal Large Language Model for Multisensor Image Comprehension in Remote Sensing Domain

WZWei Zhang MCMiaoxin Cai TZTong Zhang YZYin Zhuang XMXuerui Mao

Beijing Institute of Technology

Indexed incrossref

Abstract

Multi-modal large language models (MLLMs) have demonstrated remarkable success in vision and visual-language tasks within the natural image domain. Owing to the significant domain gap between natural and remote sensing (RS) images, the development of MLLMs in the RS domain is still in the infant stage. To fill the gap, a pioneer MLLM named EarthGPT integrating various multi-sensor RS interpretation tasks uniformly is proposed in this paper for universal RS image comprehension. Firstly, a visual-enhanced perception mechanism is constructed to refine and incorporate coarse-scale semantic perception information and fine-scale detailed perception information. Secondly, a cross-modal mutual comprehension approach…

Citation impact

131

total citations

FWCI: 29.61
Percentile: 100%
References: 131

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Modal
Remote sensing
Image sensor
Domain (mathematical analysis)
Image (mathematics)
Comprehension
Artificial intelligence

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China
Awards: 92152109, 62371048