GeoChat:Grounded Large Vision-Language Model for Remote Sensing
Mohamed bin Zayed University of Artificial Intelligence · Birla Institute of Technology and Science - Hyderabad Campus
Abstract
Recent advancements in Large Vision-Language Models (VLMs) have shown great promise in natural image domains, allowing users to hold a dialogue about given visual content. However, such general-domain VLMs perform poorly for Remote Sensing (RS) scenarios, leading to inaccurate or fabricated information when presented with RS domain-specific queries. Such a behavior emerges due to the unique challenges introduced by RS imagery. For example, to handle high-resolution RS imagery with diverse scale changes across categories and many small objects, region-level reasoning is necessary alongside holistic scene inter-pretation. Furthermore, the lack of domain-specific multimodal instruction following data as well as…
Citation impact
- FWCI
- 304.98
- Percentile
- 100%
- References
- 0
Authors
6- KKKartik KuckrejaCorresponding
Mohamed bin Zayed University of Artificial Intelligence
- MSMuhammad Sohail Danish
Mohamed bin Zayed University of Artificial Intelligence
- MNMuzammal Naseer
Mohamed bin Zayed University of Artificial Intelligence
- ADAbhijit Das
Birla Institute of Technology and Science - Hyderabad Campus
- SKSalman Khan
Mohamed bin Zayed University of Artificial Intelligence
Topics & keywords
- Computer science
- Language model
- Computer vision
- Artificial intelligence
- Remote sensing
- Human–computer interaction
- Geography