DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation

Kim, Gwanghyun; Kwon, Tae‐Sung; Ye, Jong Chul

doi:10.1109/cvpr52688.2022.00246

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation

GKGwanghyun Kim TKTae‐Sung Kwon JCJong Chul Ye

Korea Advanced Institute of Science and Technology

Indexed incrossref

Abstract

Recently, GAN inversion methods combined with Contrastive Language-Image Pretraining (CLIP) enables zeroshot image manipulation guided by text prompts. However, their applications to diverse real images are still difficult due to the limited GAN inversion capability. Specifically, these approaches often have difficulties in reconstructing images with novel poses, views, and highly variable contents compared to the training data, altering object identity, or producing unwanted image artifacts. To mitigate these problems and enable faithful manipulation of real images, we propose a novel method, dubbed DiffusionCLIP, that performs textdriven image manipulation using diffusion models. Based on full inversion…

Citation impact

470

total citations

FWCI: 25.73
Percentile: 100%
References: 73

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Image (mathematics)
Computer vision
Image manipulation
Inversion (geology)
Code (set theory)
Source code

No related works found for this paper.