SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Indexed inarxivdatacite
Abstract
We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those…
Citation impact
309
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
8Topics & keywords
Topics
Keywords
- Computer science
- Fidelity
- Encoder
- Image (mathematics)
- Code (set theory)
- Context (archaeology)
- Transparency (behavior)
- Generative model
No related works found for this paper.