StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

Li, Senmao; Weijer, Joost van de; Hu, Taihang; Khan, Fahad Shahbaz; Hou, Qibin; Wang, Yaxing; Yang, Jian; Cheng, Ming-Ming

doi:10.26599/cvm.2025.9450462

preprintComputational Visual MediaApr 17, 2026DIAMOND OA

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

SLSenmao Li JVJoost van de Weijer THTaihang Hu FSFahad Shahbaz Khan QHQibin Hou

Nankai University · Universitat de Barcelona · +1 more institution

Indexed inarxivcrossrefdatacitedoaj

Abstract

A significant research effort is focused on exploiting the outstanding capacities of pretrained diffusion models for image editing. Approaches either fine tune the model, or invert the image in the latent space of the pretrained model. However, they suffer from two problems: (i) unsatisfactory results in selected regions and unexpected changes in non selected regions, and (ii) the need for careful text prompt editing: the prompt should include all visual objects in the input image. To address this, we propose two improvements: (i) only optimizing the input of the value linear network in the cross-attention layers is sufficiently powerful to reconstruct a real image, and (ii) attention regularization to…

Citation impact

12

total citations

FWCI: 0.00
Percentile: 99%
References: 0

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Image editing
Computer science
Embedding
Artificial intelligence
Regularization (linguistics)
Classifier (UML)
Image (mathematics)
Computer vision

No related works found for this paper.