SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
University of Illinois Urbana-Champaign · Snap (United States)
Abstract
In this work, we present a novel framework built to sim-plify 3D asset generation for amateur users. To enable interactive generation, our method supports a variety of input modalities that can be easily provided by a human, in-cluding images, text, partially observed shapes and combinations of these, further allowing to adjust the strength of each input. At the core of our approach is an encoder-decoder, compressing 3D shapes into a compact latent representation, upon which a diffusion model is learned. To enable a variety of multimodal inputs, we employ task-specific encoders with dropout followed by a cross-attention mechanism. Due to its flexibility, our model naturally supports a variety of tasks,…
Citation impact
- FWCI
- 35.13
- Percentile
- 100%
- References
- 80
Authors
5Topics & keywords
- Computer science
- Encoder
- Interactivity
- Artificial intelligence
- Flexibility (engineering)
- Variety (cybernetics)
- 3D reconstruction
- Representation (politics)