SkyEyeGPT: Unifying remote sensing vision-language tasks via instruction tuning with large language model

Northwestern Polytechnical University · Technical University of Munich

Indexed incrossref

Abstract

Large language models (LLMs) have recently been extended to the vision-language realm, obtaining impressive general multi-modal capabilities. However, the exploration of multi-modal large language models (MLLMs) for remote sensing (RS) data is still in its infancy, lacking datasets and with unsatisfactory performance. In this work, we meticulously curate a large-scale RS multi-modal instruction tuning dataset, including single-task and multi-task conversation instructions. After manual verification, we obtain a high-quality RS instruction-following dataset with 968k samples, namely SkyEye-968k. To this end, we introduce SkyEyeGPT, a unified multi-modal large language model specifically designed for RS…

Citation impact

92
total citations
FWCI
87.69
Percentile
100%
References
70
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Language model
  • Artificial intelligence
  • Natural language processing
  • Human–computer interaction
No related works found for this paper.

Funding