ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation

Zhou, Ziqin; Lei, Yinjie; Zhang, Bowen; Liu, Lingqiao; Liu, Yifan

doi:10.1109/cvpr52729.2023.01075

articleJun 1, 2023Closed access

ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation

ZZZiqin Zhou YLYinjie Lei BZBowen Zhang LLLingqiao Liu YLYifan Liu

University of Adelaide · Sichuan University

Indexed incrossref

Abstract

Recently, CLIP has been applied to pixel-level zero-shot learning tasks via a two-stage scheme. The general idea is to first generate class-agnostic region proposals and then feed the cropped proposal regions to CLIP to utilize its image-level zero-shot classification capability. While effective, such a scheme requires two image encoders, one for proposal generation and one for CLIP, leading to a complicated pipeline and high computational cost. In this work, we pursue a simpler-and-efficient one-stage solution that directly extends CLIP's zero-shot prediction capability from image to pixel level. Our investigation starts with a straightforward extension as our baseline that generates semantic masks by…

Citation impact

205

total citations

FWCI: 34.00
Percentile: 100%
References: 75

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Overfitting
Computer science
Artificial intelligence
Margin (machine learning)
Segmentation
Pixel
Zero (linguistics)
Speedup

No related works found for this paper.