CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
City University of Hong Kong · Snap (United States) · +3 more institutions
Abstract
We present CLIP-NeRF, a multi-modal 3D object manipulation method for neural radiance fields (NeRF). By leveraging the joint language-image embedding space of the recent Contrastive Language-Image Pre-Training (CLIP) model, we propose a unified framework that allows manip-ulating NeRF in a user-friendly way, using either a short text prompt or an exemplar image. Specifically, to combine the novel view synthesis capability of NeRF and the controllable manipulation ability of latent representations from generative models, we introduce a disentangled conditional NeRF architecture that allows individual control over both shape and appearance. This is achieved by performing the shape conditioning via applying a…
Citation impact
- FWCI
- 25.05
- Percentile
- 100%
- References
- 54
Authors
5Topics & keywords
- Computer science
- Artificial intelligence
- Embedding
- Computer vision
- Quality Education