High-Resolution Image Synthesis with Latent Diffusion Models

Rombach, Robin; Blattmann, Andreas; Lorenz, Dominik; Esser, Patrick; Ommer, Björn

doi:10.1109/cvpr52688.2022.01042

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

High-Resolution Image Synthesis with Latent Diffusion Models

RRRobin Rombach ABAndreas Blattmann DLDominik Lorenz PEPatrick Esser BOBjörn Ommer

Ludwig-Maximilians-Universität München · Heidelberg University

Indexed incrossref

Abstract

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work,…

Citation impact

13,392

total citations

FWCI: 716.81
Percentile: 100%
References: 165

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Pixel
Inference
Inpainting
Image translation
Computer vision
Image (mathematics)

No related works found for this paper.

Funding

CD
California Department of Fish and Game
Award: 421703927.