articleJun 1, 2023Closed access

ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation

University of Adelaide · Sichuan University

Indexed incrossref

Abstract

Recently, CLIP has been applied to pixel-level zero-shot learning tasks via a two-stage scheme. The general idea is to first generate class-agnostic region proposals and then feed the cropped proposal regions to CLIP to utilize its image-level zero-shot classification capability. While effective, such a scheme requires two image encoders, one for proposal generation and one for CLIP, leading to a complicated pipeline and high computational cost. In this work, we pursue a simpler-and-efficient one-stage solution that directly extends CLIP's zero-shot prediction capability from image to pixel level. Our investigation starts with a straightforward extension as our baseline that generates semantic masks by…

Citation impact

205
total citations
FWCI
34.00
Percentile
100%
References
75
Citations per year

Authors

5

Topics & keywords

Keywords
  • Overfitting
  • Computer science
  • Artificial intelligence
  • Margin (machine learning)
  • Segmentation
  • Pixel
  • Zero (linguistics)
  • Speedup
No related works found for this paper.