StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
Nankai University · Universitat de Barcelona · +1 more institution
Abstract
A significant research effort is focused on exploiting the outstanding capacities of pretrained diffusion models for image editing. Approaches either fine tune the model, or invert the image in the latent space of the pretrained model. However, they suffer from two problems: (i) unsatisfactory results in selected regions and unexpected changes in non selected regions, and (ii) the need for careful text prompt editing: the prompt should include all visual objects in the input image. To address this, we propose two improvements: (i) only optimizing the input of the value linear network in the cross-attention layers is sufficiently powerful to reconstruct a real image, and (ii) attention regularization to…
Citation impact
- FWCI
- 0.00
- Percentile
- 99%
- References
- 0
Authors
8Topics & keywords
- Image editing
- Computer science
- Embedding
- Artificial intelligence
- Regularization (linguistics)
- Classifier (UML)
- Image (mathematics)
- Computer vision