article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access
GroupViT: Semantic Segmentation Emerges from Text Supervision
Indexed incrossref
Abstract
Grouping and recognition are important components of visual scene understanding, e.g., for object detection and semantic segmentation. With end-to-end deep learning systems, grouping of image regions usually happens implicitly via top-down supervision from pixel-level recognition labels. Instead, in this paper, we propose to bring back the grouping mechanism into deep networks, which allows semantic segments to emerge automatically with only text supervision. We propose a hierarchical Grouping Vision Transformer (GroupViT), which goes beyond the regular grid structure representation and learns to group image regions into progressively larger arbitrary-shaped segments. We train GroupViT jointly with a text…
Citation impact
403
total citations
- FWCI
- 22.42
- Percentile
- 100%
- References
- 98
Citations per year
Authors
7Topics & keywords
Topics
Keywords
- Computer science
- Pascal (unit)
- Segmentation
- Artificial intelligence
- Encoder
- Transformer
- Natural language processing
- Pattern recognition (psychology)
No related works found for this paper.