articleJun 16, 2024Closed access

GeoChat:Grounded Large Vision-Language Model for Remote Sensing

Mohamed bin Zayed University of Artificial Intelligence · Birla Institute of Technology and Science - Hyderabad Campus

Indexed incrossref

Abstract

Recent advancements in Large Vision-Language Models (VLMs) have shown great promise in natural image domains, allowing users to hold a dialogue about given visual content. However, such general-domain VLMs perform poorly for Remote Sensing (RS) scenarios, leading to inaccurate or fabricated information when presented with RS domain-specific queries. Such a behavior emerges due to the unique challenges introduced by RS imagery. For example, to handle high-resolution RS imagery with diverse scale changes across categories and many small objects, region-level reasoning is necessary alongside holistic scene inter-pretation. Furthermore, the lack of domain-specific multimodal instruction following data as well as…

No related works found for this paper.