ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation
University of Adelaide · Sichuan University
Abstract
Recently, CLIP has been applied to pixel-level zero-shot learning tasks via a two-stage scheme. The general idea is to first generate class-agnostic region proposals and then feed the cropped proposal regions to CLIP to utilize its image-level zero-shot classification capability. While effective, such a scheme requires two image encoders, one for proposal generation and one for CLIP, leading to a complicated pipeline and high computational cost. In this work, we pursue a simpler-and-efficient one-stage solution that directly extends CLIP's zero-shot prediction capability from image to pixel level. Our investigation starts with a straightforward extension as our baseline that generates semantic masks by…
Citation impact
- FWCI
- 34.00
- Percentile
- 100%
- References
- 75
Authors
5Topics & keywords
- Overfitting
- Computer science
- Artificial intelligence
- Margin (machine learning)
- Segmentation
- Pixel
- Zero (linguistics)
- Speedup