SkyEyeGPT: Unifying remote sensing vision-language tasks via instruction tuning with large language model
Northwestern Polytechnical University · Technical University of Munich
Abstract
Large language models (LLMs) have recently been extended to the vision-language realm, obtaining impressive general multi-modal capabilities. However, the exploration of multi-modal large language models (MLLMs) for remote sensing (RS) data is still in its infancy, lacking datasets and with unsatisfactory performance. In this work, we meticulously curate a large-scale RS multi-modal instruction tuning dataset, including single-task and multi-task conversation instructions. After manual verification, we obtain a high-quality RS instruction-following dataset with 968k samples, namely SkyEye-968k. To this end, we introduce SkyEyeGPT, a unified multi-modal large language model specifically designed for RS…
Citation impact
- FWCI
- 87.69
- Percentile
- 100%
- References
- 70
Authors
3Topics & keywords
- Computer science
- Language model
- Artificial intelligence
- Natural language processing
- Human–computer interaction