Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video diffusion models struggle to generate long-duration 4D dynamic scenes that are both physically consistent and spatiotemporally coherent. To address this, this work proposes Phys4D, the first framework to explicitly integrate physical consistency modeling into the 4D generation pipeline. It employs a three-stage progressive training strategy: first, large-scale pseudo-supervised pretraining establishes foundational geometry and motion priors; second, physics-aware fine-tuning leverages simulation data to enforce physical plausibility; and third, simulation-guided reinforcement learning corrects residual physical violations. The authors also introduce a comprehensive 4D world consistency evaluation suite encompassing geometric fidelity, motion stability, and long-term physical realism. Experiments demonstrate that Phys4D significantly enhances spatiotemporal detail and physical coherence while preserving strong generative capabilities, outperforming existing appearance-driven approaches across all metrics.

Technology Category

Application Category

📝 Abstract
Recent video diffusion models have achieved impressive capabilities as large-scale generative world models. However, these models often struggle with fine-grained physical consistency, exhibiting physically implausible dynamics over time. In this work, we present \textbf{Phys4D}, a pipeline for learning physics-consistent 4D world representations from video diffusion models. Phys4D adopts \textbf{a three-stage training paradigm} that progressively lifts appearance-driven video diffusion models into physics-consistent 4D world representations. We first bootstrap robust geometry and motion representations through large-scale pseudo-supervised pretraining, establishing a foundation for 4D scene modeling. We then perform physics-grounded supervised fine-tuning using simulation-generated data, enforcing temporally consistent 4D dynamics. Finally, we apply simulation-grounded reinforcement learning to correct residual physical violations that are difficult to capture through explicit supervision. To evaluate fine-grained physical consistency beyond appearance-based metrics, we introduce a set of \textbf{4D world consistency evaluation} that probe geometric coherence, motion stability, and long-horizon physical plausibility. Experimental results demonstrate that Phys4D substantially improves fine-grained spatiotemporal and physical consistency compared to appearance-driven baselines, while maintaining strong generative performance. Our project page is available at https://sensational-brioche-7657e7.netlify.app/
Problem

Research questions and friction points this paper is trying to address.

physical consistency
video diffusion
4D modeling
spatiotemporal dynamics
physics plausibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

physics-consistent 4D modeling
video diffusion models
three-stage training paradigm
simulation-grounded reinforcement learning
4D world consistency evaluation
🔎 Similar Papers
No similar papers found.