articleJun 1, 2023Closed access

CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation

Zhejiang University

Indexed incrossref

Abstract

Weakly supervised semantic segmentation (WSSS) with image-level labels is a challenging task. Mainstream approaches follow a multi-stage framework and suffer from high training costs. In this paper, we explore the potential of Contrastive Language-Image Pre-training models (CLIP) to localize different categories with only image-level labels and without further training. To efficiently generate high-quality segmentation masks from CLIP, we propose a novel WSSS framework called CLIP-ES. Our framework improves all three stages of WSSS with special designs for CLIP: 1) We introduce the softmax function into GradCAM and exploit the zero-shot ability of CLIP to suppress the confusion caused by non-target classes and…

Citation impact

204
total citations
FWCI
33.83
Percentile
100%
References
72
Citations per year

Authors

8

Topics & keywords

Keywords
  • Softmax function
  • Computer science
  • Segmentation
  • Artificial intelligence
  • Focus (optics)
  • Natural language processing
  • Pattern recognition (psychology)
  • Machine learning
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.