EarthGPT: A Universal Multimodal Large Language Model for Multisensor Image Comprehension in Remote Sensing Domain
Beijing Institute of Technology
Indexed incrossref
Abstract
Multi-modal large language models (MLLMs) have demonstrated remarkable success in vision and visual-language tasks within the natural image domain. Owing to the significant domain gap between natural and remote sensing (RS) images, the development of MLLMs in the RS domain is still in the infant stage. To fill the gap, a pioneer MLLM named EarthGPT integrating various multi-sensor RS interpretation tasks uniformly is proposed in this paper for universal RS image comprehension. Firstly, a visual-enhanced perception mechanism is constructed to refine and incorporate coarse-scale semantic perception information and fine-scale detailed perception information. Secondly, a cross-modal mutual comprehension approach…
Citation impact
131
total citations
- FWCI
- 29.61
- Percentile
- 100%
- References
- 131
Citations per year
Authors
5Topics & keywords
Topics
Keywords
- Computer science
- Modal
- Remote sensing
- Image sensor
- Domain (mathematical analysis)
- Image (mathematics)
- Comprehension
- Artificial intelligence
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.