Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

Xu, Jiarui; Liu, Sifei; Vahdat, Arash; Byeon, Wonmin; Wang, Xiaolong; Mello, Shalini De

doi:10.1109/cvpr52729.2023.00289

articleJun 1, 2023Closed access

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

JXJiarui Xu SLSifei Liu AVArash Vahdat WBWonmin Byeon XWXiaolong Wang

UC San Diego Health System

Indexed incrossref

Abstract

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have the remarkable ability to generate high-quality images with diverse open-vocabulary language descriptions. This demonstrates that their internal representation space is highly correlated with open concepts in the real world. Text-image discriminative models like CLIP, on the other hand, are good at classifying images into open-vocabulary labels. We leverage the frozen internal representations of both these models to perform panoptic segmentation of any category in the wild. Our…

Citation impact

312

total citations

FWCI: 35.52
Percentile: 100%
References: 137

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Vocabulary
Artificial intelligence
Segmentation
Discriminative model
Panopticon
Image segmentation
Natural language processing

UN Sustainable Development Goals

Reduced inequalities

No related works found for this paper.