High-Resolution Image Synthesis with Latent Diffusion Models
Ludwig-Maximilians-Universität München · Heidelberg University
Abstract
By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work,…
Citation impact
- FWCI
- 716.81
- Percentile
- 100%
- References
- 165
Authors
5- RRRobin RombachCorresponding
Ludwig-Maximilians-Universität München, Heidelberg University
- ABAndreas Blattmann
Heidelberg University, Ludwig-Maximilians-Universität München
- DLDominik Lorenz
Heidelberg University, Ludwig-Maximilians-Universität München
- PEPatrick Esser
- BOBjörn Ommer
Heidelberg University, Ludwig-Maximilians-Universität München
Topics & keywords
- Computer science
- Artificial intelligence
- Pixel
- Inference
- Inpainting
- Image translation
- Computer vision
- Image (mathematics)