articleIEEE Transactions on Geoscience and Remote SensingJan 1, 2024Closed access

RemoteCLIP: A Vision Language Foundation Model for Remote Sensing

Hohai University · Hong Kong University of Science and Technology · +4 more institutions

Indexed incrossref

Abstract

General-purpose foundation models have led to recent breakthroughs in artificial intelligence. In remote sensing, self-supervised learning (SSL) and Masked Image Modeling (MIM) have been adopted to build foundation models. However, these models primarily learn low-level features and require annotated data for fine-tuning. Moreover, they are inapplicable for retrieval and zero-shot applications due to the lack of language understanding. To address these limitations, we propose RemoteCLIP, the first vision-language foundation model for remote sensing that aims to learn robust visual features with rich semantics and aligned text embeddings for seamless downstream application. To address the scarcity of…

Citation impact

314
total citations
FWCI
70.03
Percentile
100%
References
124
Citations per year

Authors

8

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Leverage (statistics)
  • Machine learning
  • Language model
  • Benchmark (surveying)
  • Information retrieval
No related works found for this paper.

Funding