Token Contrast for Weakly-Supervised Semantic Segmentation
Wuhan University · Jingdong (China)
Abstract
Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the local structure perception of CNN, CAM usually cannot identify the integral object regions. Though the recent Vision Transformer (ViT) can remedy this flaw, we observe it also brings the over-smoothing issue, i.e., the final patch tokens incline to be uniform. In this work, we propose Token Contrast (ToCo) to address this issue and further explore the virtue of ViT for WSSS. Firstly, motivated by the observation that intermediate layers in ViT can still retain semantic diversity, we designed a Patch Token Contrast module (PTC). PTC supervises the…
Citation impact
- FWCI
- 20.31
- Percentile
- 100%
- References
- 58
Authors
4Topics & keywords
- Computer science
- Security token
- Segmentation
- Contrast (vision)
- Artificial intelligence
- Semantics (computer science)
- Class (philosophy)
- Pattern recognition (psychology)