4D Driving Scene Generation With Stereo Forcing

πŸ“… 2025-09-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing generative models struggle to jointly perform temporal extrapolation and novel-view synthesis (NVS) for dynamic 4D driving scenes without per-scene optimization, primarily due to the difficulty of jointly modeling geometric and temporal consistency. This paper proposes a unified generative framework addressing this challenge. We introduce Stereo Forcingβ€”a conditional strategy that leverages geometric uncertainty to guide diffusion-based denoising, explicitly enforcing geometric consistency across views and frames. To enable efficient 4D reconstruction, we integrate a pre-trained video VAE with a range-view adapter; furthermore, we design a geometry-guided video diffusion model to synthesize future multi-view sequences. Our method achieves state-of-the-art performance on appearance/geometry reconstruction, temporal generation, and NVS. Crucially, it demonstrates strong generalization in downstream perception and motion prediction tasks, validating its robustness and practical utility.

Technology Category

Application Category

πŸ“ Abstract
Current generative models struggle to synthesize dynamic 4D driving scenes that simultaneously support temporal extrapolation and spatial novel view synthesis (NVS) without per-scene optimization. Bridging generation and novel view synthesis remains a major challenge. We present PhiGenesis, a unified framework for 4D scene generation that extends video generation techniques with geometric and temporal consistency. Given multi-view image sequences and camera parameters, PhiGenesis produces temporally continuous 4D Gaussian splatting representations along target 3D trajectories. In its first stage, PhiGenesis leverages a pre-trained video VAE with a novel range-view adapter to enable feed-forward 4D reconstruction from multi-view images. This architecture supports single-frame or video inputs and outputs complete 4D scenes including geometry, semantics, and motion. In the second stage, PhiGenesis introduces a geometric-guided video diffusion model, using rendered historical 4D scenes as priors to generate future views conditioned on trajectories. To address geometric exposure bias in novel views, we propose Stereo Forcing, a novel conditioning strategy that integrates geometric uncertainty during denoising. This method enhances temporal coherence by dynamically adjusting generative influence based on uncertainty-aware perturbations. Our experimental results demonstrate that our method achieves state-of-the-art performance in both appearance and geometric reconstruction, temporal generation and novel view synthesis (NVS) tasks, while simultaneously delivering competitive performance in downstream evaluations. Homepage is at href{https://jiangxb98.github.io/PhiGensis}{PhiGensis}.
Problem

Research questions and friction points this paper is trying to address.

Generating dynamic 4D driving scenes with temporal and spatial consistency
Bridging scene generation with novel view synthesis without per-scene optimization
Addressing geometric exposure bias in novel view synthesis tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework combining video generation with 4D Gaussian splatting
Geometric-guided video diffusion model using rendered scenes as priors
Stereo Forcing conditioning strategy for geometric uncertainty handling
πŸ”Ž Similar Papers
H
Hao Lu
Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Zhuang Ma
Zhuang Ma
The Wharton School, University of Pennsylvania
Machine LearningStatistics
G
Guangfeng Jiang
University of Science and Technology of China
Wenhang Ge
Wenhang Ge
HKUST-GZ
Computer vision
B
Bohan Li
Shanghai Jiao Tong University, Shanghai, China
Y
Yuzhan Cai
Wenzhao Zheng
Wenzhao Zheng
EECS, University of California, Berkeley
Large ModelsEmbodied AgentsAutonomous Driving
Y
Yunpeng Zhang
Y
Yingcong Chen
Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China