CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation

Lin, Yuqi; Chen, Minghao; Wang, Wenxiao; Wu, Boxi; Li, Ke; Lin, Binbin; Liu, Haifeng; He, Xiaofei

doi:10.1109/cvpr52729.2023.01469

articleJun 1, 2023Closed access

CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation

YLYuqi Lin MCMinghao Chen WWWenxiao Wang BWBoxi Wu KLKe Li

Zhejiang University

Indexed incrossref

Abstract

Weakly supervised semantic segmentation (WSSS) with image-level labels is a challenging task. Mainstream approaches follow a multi-stage framework and suffer from high training costs. In this paper, we explore the potential of Contrastive Language-Image Pre-training models (CLIP) to localize different categories with only image-level labels and without further training. To efficiently generate high-quality segmentation masks from CLIP, we propose a novel WSSS framework called CLIP-ES. Our framework improves all three stages of WSSS with special designs for CLIP: 1) We introduce the softmax function into GradCAM and exploit the zero-shot ability of CLIP to suppress the confusion caused by non-target classes and…

Citation impact

204

total citations

FWCI: 33.83
Percentile: 100%
References: 72

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Softmax function
Computer science
Segmentation
Artificial intelligence
Focus (optics)
Natural language processing
Pattern recognition (psychology)
Machine learning

UN Sustainable Development Goals

Quality Education

No related works found for this paper.