SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Podell, Dustin; English, Zion; Lacey, Kyle; Blattmann, Andreas; Dockhorn, Tim; Müller, Jonas; Penna, Joe; Rombach, Robin

doi:10.48550/arxiv.2307.01952

preprintarXiv (Cornell University)Jul 4, 2023GREEN OA

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

DPDustin Podell ZEZion English KLKyle Lacey ABAndreas Blattmann TDTim Dockhorn

Indexed inarxivdatacite

Abstract

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those…

Citation impact

309

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Fidelity
Encoder
Image (mathematics)
Code (set theory)
Context (archaeology)
Transparency (behavior)
Generative model

No related works found for this paper.