WorldPrompter: Traversable Text-to-Scene Generation

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-3D scene generation methods struggle to simultaneously ensure scene completeness and navigability. This paper proposes the first end-to-end framework for generating fully traversable 3D scenes from text. It employs a 128-frame 360° panoramic video as an intermediate representation and introduces the first conditional diffusion model for controllable 360° video generation, jointly optimized with feedforward 3D Gaussian splatting for photorealistic geometry and texture reconstruction. To ensure coherence, we incorporate multi-frame view consistency constraints and a text–video–3D cross-modal alignment mechanism. Our method surpasses state-of-the-art approaches in scene completeness (enabling free navigation over >10 m²), view consistency, and reconstruction fidelity: panoramic video FID improves by 23%, and 3D reconstruction PSNR increases by 5.1 dB.

Technology Category

Application Category

📝 Abstract
Scene-level 3D generation is a challenging research topic, with most existing methods generating only partial scenes and offering limited navigational freedom. We introduce WorldPrompter, a novel generative pipeline for synthesizing traversable 3D scenes from text prompts. We leverage panoramic videos as an intermediate representation to model the 360{deg} details of a scene. WorldPrompter incorporates a conditional 360{deg} panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment. The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene. Experiments demonstrate that our panoramic video generation model achieves convincing view consistency across frames, enabling high-quality panoramic Gaussian splat reconstruction and facilitating traversal over an area of the scene. Qualitative and quantitative results also show it outperforms the state-of-the-art 360{deg} video generators and 3D scene generation models.
Problem

Research questions and friction points this paper is trying to address.

Generating traversable 3D scenes from text prompts
Overcoming limited navigational freedom in scene-level 3D generation
Improving view consistency for high-quality panoramic reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates traversable 3D scenes from text prompts
Uses panoramic videos for 360° scene modeling
Reconstructs scenes with fast Gaussian splats
🔎 Similar Papers
No similar papers found.