articleJun 1, 2023Closed access
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Indexed incrossref
Abstract
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have the remarkable ability to generate high-quality images with diverse open-vocabulary language descriptions. This demonstrates that their internal representation space is highly correlated with open concepts in the real world. Text-image discriminative models like CLIP, on the other hand, are good at classifying images into open-vocabulary labels. We leverage the frozen internal representations of both these models to perform panoptic segmentation of any category in the wild. Our…
Citation impact
312
total citations
- FWCI
- 35.52
- Percentile
- 100%
- References
- 137
Citations per year
Authors
6Topics & keywords
Topics
Keywords
- Computer science
- Vocabulary
- Artificial intelligence
- Segmentation
- Discriminative model
- Panopticon
- Image segmentation
- Natural language processing
UN Sustainable Development Goals
- Reduced inequalities
No related works found for this paper.