GeoChat:Grounded Large Vision-Language Model for Remote Sensing

Kuckreja, Kartik; Danish, Muhammad Sohail; Naseer, Muzammal; Das, Abhijit; Khan, Salman; Khan, Fahad Shahbaz

doi:10.1109/cvpr52733.2024.02629

articleJun 16, 2024Closed access

GeoChat:Grounded Large Vision-Language Model for Remote Sensing

KKKartik Kuckreja MSMuhammad Sohail Danish MNMuzammal Naseer ADAbhijit Das SKSalman Khan

Mohamed bin Zayed University of Artificial Intelligence · Birla Institute of Technology and Science - Hyderabad Campus

Indexed incrossref

Abstract

Recent advancements in Large Vision-Language Models (VLMs) have shown great promise in natural image domains, allowing users to hold a dialogue about given visual content. However, such general-domain VLMs perform poorly for Remote Sensing (RS) scenarios, leading to inaccurate or fabricated information when presented with RS domain-specific queries. Such a behavior emerges due to the unique challenges introduced by RS imagery. For example, to handle high-resolution RS imagery with diverse scale changes across categories and many small objects, region-level reasoning is necessary alongside holistic scene inter-pretation. Furthermore, the lack of domain-specific multimodal instruction following data as well as…

Citation impact

171

total citations

FWCI: 304.98
Percentile: 100%
References: 0

Citations per year

Authors

6

Topics & keywords

Topics

Geographic Information Systems Studies92%

Keywords

Computer science
Language model
Computer vision
Artificial intelligence
Remote sensing
Human–computer interaction
Geography

No related works found for this paper.