preprintComputational Visual MediaApr 17, 2026DIAMOND OA

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

Nankai University · Universitat de Barcelona · +1 more institution

Indexed inarxivcrossrefdatacitedoaj

Abstract

A significant research effort is focused on exploiting the outstanding capacities of pretrained diffusion models for image editing. Approaches either fine tune the model, or invert the image in the latent space of the pretrained model. However, they suffer from two problems: (i) unsatisfactory results in selected regions and unexpected changes in non selected regions, and (ii) the need for careful text prompt editing: the prompt should include all visual objects in the input image. To address this, we propose two improvements: (i) only optimizing the input of the value linear network in the cross-attention layers is sufficiently powerful to reconstruct a real image, and (ii) attention regularization to…

Citation impact

12
total citations
FWCI
0.00
Percentile
99%
References
0
Citations per year

Authors

8

Topics & keywords

Keywords
  • Image editing
  • Computer science
  • Embedding
  • Artificial intelligence
  • Regularization (linguistics)
  • Classifier (UML)
  • Image (mathematics)
  • Computer vision
No related works found for this paper.