🤖 AI Summary
Existing 3D scene generation methods suffer from geometric/appearance inconsistency, poor scalability of implicit representations, and limited capacity of object-level models for large-scale scenes. This paper proposes the first hierarchical 3D scene generation framework supporting infinite spatial extension: built upon high-quality scene chunk data, it integrates pretrained 3D model priors with structured latent space modeling, introduces a 3D chunk-level context-aware inpainting mechanism, and adopts a coarse-to-fine hierarchical generation strategy. Our method achieves, for the first time, large-scale continuous 3D scene synthesis with geometric coherence, photorealistic appearance, and unbounded scalability. On 3D-FRONT, it establishes new state-of-the-art performance in geometric reconstruction, producing outputs with photo-realistic fidelity and cross-regional structural consistency. It significantly advances scene-level generation by overcoming fundamental bottlenecks in scale, consistency, and generalizability.
📝 Abstract
We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance. Existing methods face key challenges: 2D-lifting approaches suffer from geometric and appearance inconsistencies across views, 3D implicit representations are hard to scale up, and current 3D foundation models are mostly object-centric, limiting their applicability to scene-level generation. Our key insight is leveraging strong generation priors from pre-trained 3D models for structured scene block generation. To this end, we propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis. Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity. Evaluated on the large-scale 3D-FRONT dataset, WorldGrow achieves SOTA performance in geometry reconstruction, while uniquely supporting infinite scene generation with photorealistic and structurally consistent outputs. These results highlight its capability for constructing large-scale virtual environments and potential for building future world models.