Vector Quantized Diffusion Model for Text-to-Image Synthesis
University of Science and Technology of China · Microsoft (Germany) · +1 more institution
Abstract
We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latent-space method is well-suited for text-to-image generation tasks because it not only eliminates the unidirectional bias with existing methods but also allows us to incorporate a mask-and-replace diffusion strategy to avoid the accumulation of errors, which is a serious problem with existing methods. Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation…
Citation impact
- FWCI
- 34.64
- Percentile
- 100%
- References
- 107
Authors
8Topics & keywords
- Diffusion
- Computer science
- Image quality
- Image (mathematics)
- Algorithm
- Anisotropic diffusion
- Artificial intelligence
- Autoregressive model