This research introduces a 3D volumetric encoder for efficient and flexible text-to-3D generation. A lightweight network is developed to acquire feature volumes from multi-view images, which are then trained on a diffusion model for 3D generation using a 3D U-Net. The model addresses challenges of inaccurate object captions and high-dimensional feature volumes. It allows finer control over object part characteristics through textual cues, combining multiple concepts within a single object. This research contributes to 3D generation by introducing an efficient, flexible, and scalable representation methodology.
Publication date: 19 Dec 2023
Project Page: https://github.com/tzco/VolumeDiffusion
Paper: https://arxiv.org/pdf/2312.11459