WorldPrompter: Traversable Text-to-Scene Generation

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing text-to-3D scene generation methods struggle to simultaneously ensure scene completeness and navigability. This paper proposes the first end-to-end framework for generating fully traversable 3D scenes from text. It employs a 128-frame 360° panoramic video as an intermediate representation and introduces the first conditional diffusion model for controllable 360° video generation, jointly optimized with feedforward 3D Gaussian splatting for photorealistic geometry and texture reconstruction. To ensure coherence, we incorporate multi-frame view consistency constraints and a text–video–3D cross-modal alignment mechanism. Our method surpasses state-of-the-art approaches in scene completeness (enabling free navigation over >10 m²), view consistency, and reconstruction fidelity: panoramic video FID improves by 23%, and 3D reconstruction PSNR increases by 5.1 dB.

Technology Category

Application Category

📝 Abstract

Scene-level 3D generation is a challenging research topic, with most existing methods generating only partial scenes and offering limited navigational freedom. We introduce WorldPrompter, a novel generative pipeline for synthesizing traversable 3D scenes from text prompts. We leverage panoramic videos as an intermediate representation to model the 360{deg} details of a scene. WorldPrompter incorporates a conditional 360{deg} panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment. The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene. Experiments demonstrate that our panoramic video generation model achieves convincing view consistency across frames, enabling high-quality panoramic Gaussian splat reconstruction and facilitating traversal over an area of the scene. Qualitative and quantitative results also show it outperforms the state-of-the-art 360{deg} video generators and 3D scene generation models.

Problem

Research questions and friction points this paper is trying to address.

Generating traversable 3D scenes from text prompts

Overcoming limited navigational freedom in scene-level 3D generation

Improving view consistency for high-quality panoramic reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates traversable 3D scenes from text prompts

Uses panoramic videos for 360° scene modeling

Reconstructs scenes with fast Gaussian splats

🔎 Similar Papers

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion