articleJun 1, 2023Closed access

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

UC San Diego Health System

Indexed incrossref

Abstract

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have the remarkable ability to generate high-quality images with diverse open-vocabulary language descriptions. This demonstrates that their internal representation space is highly correlated with open concepts in the real world. Text-image discriminative models like CLIP, on the other hand, are good at classifying images into open-vocabulary labels. We leverage the frozen internal representations of both these models to perform panoptic segmentation of any category in the wild. Our…

Citation impact

312
total citations
FWCI
35.52
Percentile
100%
References
137
Citations per year

Authors

6

Topics & keywords

Keywords
  • Computer science
  • Vocabulary
  • Artificial intelligence
  • Segmentation
  • Discriminative model
  • Panopticon
  • Image segmentation
  • Natural language processing
UN Sustainable Development Goals
  • Reduced inequalities
No related works found for this paper.