CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

Wang, Can; Chai, Menglei; He, Mingming; Chen, Dongdong; Liao, Jing

doi:10.1109/cvpr52688.2022.00381

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

CWCan Wang MCMenglei Chai MHMingming He DCDongdong Chen JLJing Liao

City University of Hong Kong · Snap (United States) · +3 more institutions

Indexed incrossref

Abstract

We present CLIP-NeRF, a multi-modal 3D object manipulation method for neural radiance fields (NeRF). By leveraging the joint language-image embedding space of the recent Contrastive Language-Image Pre-Training (CLIP) model, we propose a unified framework that allows manip-ulating NeRF in a user-friendly way, using either a short text prompt or an exemplar image. Specifically, to combine the novel view synthesis capability of NeRF and the controllable manipulation ability of latent representations from generative models, we introduce a disentangled conditional NeRF architecture that allows individual control over both shape and appearance. This is achieved by performing the shape conditioning via applying a…

Citation impact

316

total citations

FWCI: 25.05
Percentile: 100%
References: 54

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Embedding
Computer vision

UN Sustainable Development Goals

Quality Education

No related works found for this paper.