RemoteCLIP: A Vision Language Foundation Model for Remote Sensing

Liu, Fan; Chen, Delong; Guan, Zhangqingyun; Zhou, Xiaocong; Zhu, Jiale; Ye, Qiaolin; Fu, Liyong; Zhou, Jun

doi:10.1109/tgrs.2024.3390838

articleIEEE Transactions on Geoscience and Remote SensingJan 1, 2024Closed access

RemoteCLIP: A Vision Language Foundation Model for Remote Sensing

FLFan Liu DCDelong Chen ZGZhangqingyun Guan XZXiaocong Zhou JZJiale Zhu

Hohai University · Hong Kong University of Science and Technology · +4 more institutions

Indexed incrossref

Abstract

General-purpose foundation models have led to recent breakthroughs in artificial intelligence. In remote sensing, self-supervised learning (SSL) and Masked Image Modeling (MIM) have been adopted to build foundation models. However, these models primarily learn low-level features and require annotated data for fine-tuning. Moreover, they are inapplicable for retrieval and zero-shot applications due to the lack of language understanding. To address these limitations, we propose RemoteCLIP, the first vision-language foundation model for remote sensing that aims to learn robust visual features with rich semantics and aligned text embeddings for seamless downstream application. To address the scarcity of…

Citation impact

314

total citations

FWCI: 70.03
Percentile: 100%
References: 124

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Leverage (statistics)
Machine learning
Language model
Benchmark (surveying)
Information retrieval

No related works found for this paper.