Blended Diffusion for Text-driven Editing of Natural Images

Hebrew University of Jerusalem · Brandman University

Indexed inarxivcrossref

Abstract

Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP), to steer the edit towards a user-provided text prompt, with a denoising diffusion probabilistic model (DDPM) to generate natural-looking results. To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent at a progression of noise levels. In addition, we show…

Citation impact

683
total citations
FWCI
37.71
Percentile
100%
References
77
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Natural (archaeology)
  • Diffusion
  • Image editing
  • Computer graphics (images)
  • Artificial intelligence
  • World Wide Web
  • Multimedia
No related works found for this paper.

Funding