Abstract
The tremendous progress in neural image generation, coupled with the emergence of seemingly omnipotent vision-language models has finally enabled text-based interfaces for creating and editing images. Handling generic images requires a diverse underlying generative model, hence the latest works utilize diffusion models, which were shown to surpass GANs in terms of diversity. One major drawback of diffusion models, however, is their relatively slow inference time. In this paper, we present an accelerated solution to the task of local text-driven editing of generic images, where the desired edits are confined to a user-provided mask. Our solution leverages a text-to-image Latent Diffusion Model (LDM), which…
Citation impact
297
total citations
- FWCI
- 33.69
- Percentile
- 100%
- References
- 20
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Computer science
- Diffusion
- Image editing
- Task (project management)
- Inference
- Image (mathematics)
- Artificial intelligence
- Generative model
No related works found for this paper.