SkyEyeGPT: Unifying remote sensing vision-language tasks via instruction tuning with large language model

Zhan, Yang; Xiong, Zhitong; Yuan, Yuan

doi:10.1016/j.isprsjprs.2025.01.020

articleISPRS Journal of Photogrammetry and Remote SensingFeb 5, 2025HYBRID OA

SkyEyeGPT: Unifying remote sensing vision-language tasks via instruction tuning with large language model

YZYang Zhan ZXZhitong Xiong YYYuan Yuan

Northwestern Polytechnical University · Technical University of Munich

Indexed incrossref

Abstract

Large language models (LLMs) have recently been extended to the vision-language realm, obtaining impressive general multi-modal capabilities. However, the exploration of multi-modal large language models (MLLMs) for remote sensing (RS) data is still in its infancy, lacking datasets and with unsatisfactory performance. In this work, we meticulously curate a large-scale RS multi-modal instruction tuning dataset, including single-task and multi-task conversation instructions. After manual verification, we obtain a high-quality RS instruction-following dataset with 968k samples, namely SkyEye-968k. To this end, we introduce SkyEyeGPT, a unified multi-modal large language model specifically designed for RS…

Citation impact

92

total citations

FWCI: 87.69
Percentile: 100%
References: 70

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Language model
Artificial intelligence
Natural language processing
Human–computer interaction

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China