LT3SD: Latent Trees for 3D Scene Diffusion

📅 2024-09-12

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 1

career value

202K/year

🤖 AI Summary

Existing diffusion models for 3D scene generation suffer from limited spatial scalability, low geometric fidelity, and inability to support probabilistic completion from partial observations. To address these limitations, we propose Latent Tree Diffusion (LTD), the first framework to model multi-scale 3D geometry and detail via a hierarchical tree-structured latent space, enabling coarse-to-fine scene encoding. LTD introduces a block-wise collaborative diffusion mechanism with cross-block parameter sharing, supporting unconditional generation of arbitrarily sized scenes and probabilistic completion conditioned on local observations. The method integrates implicit 3D representations, hierarchical diffusion training, and tree-structured latent modeling. Experiments demonstrate that LTD achieves state-of-the-art performance in both unconditional 3D scene generation and partial-observation completion, significantly outperforming prior methods in spatial scale and geometric accuracy on large-scale scenes.

Technology Category

Application Category

📝 Abstract

We present LT3SD, a novel latent diffusion model for large-scale 3D scene generation. Recent advances in diffusion models have shown impressive results in 3D object generation, but are limited in spatial extent and quality when extended to 3D scenes. To generate complex and diverse 3D scene structures, we introduce a latent tree representation to effectively encode both lower-frequency geometry and higher-frequency detail in a coarse-to-fine hierarchy. We can then learn a generative diffusion process in this latent 3D scene space, modeling the latent components of a scene at each resolution level. To synthesize large-scale scenes with varying sizes, we train our diffusion model on scene patches and synthesize arbitrary-sized output 3D scenes through shared diffusion generation across multiple scene patches. Through extensive experiments, we demonstrate the efficacy and benefits of LT3SD for large-scale, high-quality unconditional 3D scene generation and for probabilistic completion for partial scene observations.

Problem

Research questions and friction points this paper is trying to address.

Generating large-scale 3D scenes with high quality

Encoding complex 3D scene structures hierarchically

Synthesizing arbitrary-sized 3D scenes from patches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent tree representation for 3D scenes

Coarse-to-fine hierarchical diffusion process

Patch-based synthesis for scalable scenes

🔎 Similar Papers

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation