DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation

Korea Advanced Institute of Science and Technology

Indexed incrossref

Abstract

Recently, GAN inversion methods combined with Contrastive Language-Image Pretraining (CLIP) enables zeroshot image manipulation guided by text prompts. However, their applications to diverse real images are still difficult due to the limited GAN inversion capability. Specifically, these approaches often have difficulties in reconstructing images with novel poses, views, and highly variable contents compared to the training data, altering object identity, or producing unwanted image artifacts. To mitigate these problems and enable faithful manipulation of real images, we propose a novel method, dubbed DiffusionCLIP, that performs textdriven image manipulation using diffusion models. Based on full inversion…

Citation impact

470
total citations
FWCI
25.73
Percentile
100%
References
73
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Image (mathematics)
  • Computer vision
  • Image manipulation
  • Inversion (geology)
  • Code (set theory)
  • Source code
No related works found for this paper.