WorldGrow: Generating Infinite 3D World

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D scene generation methods suffer from geometric/appearance inconsistency, poor scalability of implicit representations, and limited capacity of object-level models for large-scale scenes. This paper proposes the first hierarchical 3D scene generation framework supporting infinite spatial extension: built upon high-quality scene chunk data, it integrates pretrained 3D model priors with structured latent space modeling, introduces a 3D chunk-level context-aware inpainting mechanism, and adopts a coarse-to-fine hierarchical generation strategy. Our method achieves, for the first time, large-scale continuous 3D scene synthesis with geometric coherence, photorealistic appearance, and unbounded scalability. On 3D-FRONT, it establishes new state-of-the-art performance in geometric reconstruction, producing outputs with photo-realistic fidelity and cross-regional structural consistency. It significantly advances scene-level generation by overcoming fundamental bottlenecks in scale, consistency, and generalizability.

Technology Category

Application Category

📝 Abstract
We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance. Existing methods face key challenges: 2D-lifting approaches suffer from geometric and appearance inconsistencies across views, 3D implicit representations are hard to scale up, and current 3D foundation models are mostly object-centric, limiting their applicability to scene-level generation. Our key insight is leveraging strong generation priors from pre-trained 3D models for structured scene block generation. To this end, we propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis. Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity. Evaluated on the large-scale 3D-FRONT dataset, WorldGrow achieves SOTA performance in geometry reconstruction, while uniquely supporting infinite scene generation with photorealistic and structurally consistent outputs. These results highlight its capability for constructing large-scale virtual environments and potential for building future world models.
Problem

Research questions and friction points this paper is trying to address.

Generating infinitely extendable 3D worlds with coherent geometry
Overcoming geometric inconsistencies in 2D-lifting and 3D scaling limitations
Enabling photorealistic infinite scene generation beyond object-centric approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical framework for unbounded 3D scene synthesis
3D block inpainting mechanism for context-aware extension
Coarse-to-fine strategy ensuring layout plausibility and fidelity