Vector Quantized Diffusion Model for Text-to-Image Synthesis

Gu, Shuyang; Chen, Dong; Bao, Jianmin; Wen, Fang; Zhang, Bo; Chen, Dongdong; Yuan, Lu; Guo, Baining

doi:10.1109/cvpr52688.2022.01043

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

Vector Quantized Diffusion Model for Text-to-Image Synthesis

SGShuyang Gu DCDong Chen JBJianmin Bao FWFang Wen BZBo Zhang

University of Science and Technology of China · Microsoft (Germany) · +1 more institution

Indexed incrossref

Abstract

We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latent-space method is well-suited for text-to-image generation tasks because it not only eliminates the unidirectional bias with existing methods but also allows us to incorporate a mask-and-replace diffusion strategy to avoid the accumulation of errors, which is a serious problem with existing methods. Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation…

Citation impact

617

total citations

FWCI: 34.64
Percentile: 100%
References: 107

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Diffusion
Computer science
Image quality
Image (mathematics)
Algorithm
Anisotropic diffusion
Artificial intelligence
Autoregressive model

No related works found for this paper.