articleJun 1, 2023Closed access
CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation
Indexed incrossref
Abstract
Weakly supervised semantic segmentation (WSSS) with image-level labels is a challenging task. Mainstream approaches follow a multi-stage framework and suffer from high training costs. In this paper, we explore the potential of Contrastive Language-Image Pre-training models (CLIP) to localize different categories with only image-level labels and without further training. To efficiently generate high-quality segmentation masks from CLIP, we propose a novel WSSS framework called CLIP-ES. Our framework improves all three stages of WSSS with special designs for CLIP: 1) We introduce the softmax function into GradCAM and exploit the zero-shot ability of CLIP to suppress the confusion caused by non-target classes and…
Citation impact
204
total citations
- FWCI
- 33.83
- Percentile
- 100%
- References
- 72
Citations per year
Authors
8Topics & keywords
Topics
Keywords
- Softmax function
- Computer science
- Segmentation
- Artificial intelligence
- Focus (optics)
- Natural language processing
- Pattern recognition (psychology)
- Machine learning
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.