TiP4GEN: Text to Immersive Panorama 4D Scene Generation

📅 2025-08-17

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing methods struggle to generate high-fidelity, full-view-consistent dynamic panoramic 4D scenes, typically being limited to static content or narrow-field-of-view videos. To address this, we propose a dual-branch generative framework that jointly performs panoramic video synthesis and dynamic scene reconstruction. The front branch enables fine-grained spatiotemporal control via bidirectional cross-attention; the back branch leverages metric depth maps to guide geometric alignment of 3D Gaussian splatting point clouds and jointly optimizes camera poses. To our knowledge, this is the first method achieving geometrically consistent, motion-coherent, and view-invariant immersive panoramic 4D scene generation. Extensive experiments demonstrate significant improvements over state-of-the-art static and narrow-FOV approaches in visual realism, temporal consistency, and geometric stability. Our work establishes a new paradigm for constructing 360° dynamic virtual environments.

Technology Category

Application Category

📝 Abstract

With the rapid advancement and widespread adoption of VR/AR technologies, there is a growing demand for the creation of high-quality, immersive dynamic scenes. However, existing generation works predominantly concentrate on the creation of static scenes or narrow perspective-view dynamic scenes, falling short of delivering a truly 360-degree immersive experience from any viewpoint. In this paper, we introduce extbf{TiP4GEN}, an advanced text-to-dynamic panorama scene generation framework that enables fine-grained content control and synthesizes motion-rich, geometry-consistent panoramic 4D scenes. TiP4GEN integrates panorama video generation and dynamic scene reconstruction to create 360-degree immersive virtual environments. For video generation, we introduce a extbf{Dual-branch Generation Model} consisting of a panorama branch and a perspective branch, responsible for global and local view generation, respectively. A bidirectional cross-attention mechanism facilitates comprehensive information exchange between the branches. For scene reconstruction, we propose a extbf{Geometry-aligned Reconstruction Model} based on 3D Gaussian Splatting. By aligning spatial-temporal point clouds using metric depth maps and initializing scene cameras with estimated poses, our method ensures geometric consistency and temporal coherence for the reconstructed scenes. Extensive experiments demonstrate the effectiveness of our proposed designs and the superiority of TiP4GEN in generating visually compelling and motion-coherent dynamic panoramic scenes. Our project page is at https://ke-xing.github.io/TiP4GEN/.

Problem

Research questions and friction points this paper is trying to address.

Generates immersive 360-degree dynamic scenes from text

Ensures geometry consistency in panoramic 4D scenes

Combines panorama video generation with dynamic reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-branch Generation Model for panorama and perspective views

Geometry-aligned Reconstruction Model using 3D Gaussian Splatting

Bidirectional cross-attention for comprehensive information exchange

🔎 Similar Papers

No similar papers found.