DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
Korea Advanced Institute of Science and Technology
Abstract
Recently, GAN inversion methods combined with Contrastive Language-Image Pretraining (CLIP) enables zeroshot image manipulation guided by text prompts. However, their applications to diverse real images are still difficult due to the limited GAN inversion capability. Specifically, these approaches often have difficulties in reconstructing images with novel poses, views, and highly variable contents compared to the training data, altering object identity, or producing unwanted image artifacts. To mitigate these problems and enable faithful manipulation of real images, we propose a novel method, dubbed DiffusionCLIP, that performs textdriven image manipulation using diffusion models. Based on full inversion…
Citation impact
- FWCI
- 25.73
- Percentile
- 100%
- References
- 73
Authors
3Topics & keywords
- Computer science
- Artificial intelligence
- Image (mathematics)
- Computer vision
- Image manipulation
- Inversion (geology)
- Code (set theory)
- Source code