Blended Diffusion for Text-driven Editing of Natural Images
Hebrew University of Jerusalem · Brandman University
Abstract
Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP), to steer the edit towards a user-provided text prompt, with a denoising diffusion probabilistic model (DDPM) to generate natural-looking results. To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent at a progression of noise levels. In addition, we show…
Citation impact
- FWCI
- 37.71
- Percentile
- 100%
- References
- 77
Authors
3Topics & keywords
- Computer science
- Natural (archaeology)
- Diffusion
- Image editing
- Computer graphics (images)
- Artificial intelligence
- World Wide Web
- Multimedia