🤖 AI Summary
Existing 3D generation methods struggle to effectively leverage readily available point cloud priors, resulting in limited geometric controllability. This work proposes Points-to-3D, the first approach to explicitly embed point cloud priors into a TRELLIS latent-space-based 3D diffusion generation pipeline. By integrating a structure refinement network, customized point cloud latent initialization, and a two-stage sampling strategy—comprising structural restoration followed by boundary refinement—the method completes global geometry while preserving visible regions. It supports point cloud inputs derived from real LiDAR scans or monocular image estimates, and consistently outperforms state-of-the-art methods in both rendering quality and geometric fidelity across object and scene generation tasks, enabling high-fidelity, structurally controllable 3D synthesis.
📝 Abstract
Recent progress in 3D generation has been driven largely by models conditioned on images or text, while readily available 3D priors are still underused. In many real-world scenarios, the visible-region point cloud are easy to obtain from active sensors such as LiDAR or from feed-forward predictors like VGGT, offering explicit geometric constraints that current methods fail to exploit. In this work, we introduce Points-to-3D, a diffusion-based framework that leverages point cloud priors for geometry-controllable 3D asset and scene generation. Built on a latent 3D diffusion model TRELLIS, Points-to-3D first replaces pure-noise sparse structure latent initialization with a point cloud priors tailored input formulation.A structure inpainting network, trained within the TRELLIS framework on task-specific data designed to learn global structural inpainting, is then used for inference with a staged sampling strategy (structural inpainting followed by boundary refinement), completing the global geometry while preserving the visible regions of the input priors.In practice, Points-to-3D can take either accurate point-cloud priors or VGGT-estimated point clouds from single images as input. Experiments on both objects and scene scenarios consistently demonstrate superior performance over state-of-the-art baselines in terms of rendering quality and geometric fidelity, highlighting the effectiveness of explicitly embedding point-cloud priors for achieving more accurate and structurally controllable 3D generation.