PhyWorld: Physics-Faithful World Model for Video Generation

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

238K/year
🤖 AI Summary
Existing video generation models often lack temporal coherence and adherence to physical laws, limiting their utility for high-fidelity simulation in physical AI training. To address this, this work proposes a two-stage post-training approach: first, flow matching fine-tuning enhances inter-frame visual and motion consistency; second, it introduces physical preference pairs with Direct Preference Optimization (DPO) into video generation—marking the first such application—to explicitly align outputs with fundamental physical principles. The method substantially improves the physical plausibility of generated videos, achieving a score of 0.769 on VBench, surpassing the current state of the art, and obtaining 3.09 on a newly curated physical fidelity benchmark, significantly outperforming the strongest baseline at 2.99.
📝 Abstract
World simulators can provide safe and scalable environments for training Physical AI systems before real-world deployment. Large video generation models are emerging as a promising basis for such simulators because they can generate diverse and realistic visual futures. However, using them as world simulators requires physically faithful video continuations, namely, generated videos that preserve the physical state implied by the conditioning input, and evolve in ways consistent with basic physical principles. We propose PhyWorld, a video generation world model designed to produce temporally coherent and physically faithful scene continuations through two-stage post-training. In the first stage, we improve video-to-video continuation with flow matching fine-tuning, encouraging stable visual attributes and coherent motion dynamics across frames. In the second stage, we align generated dynamics with physical principles using Direct Preference Optimization (DPO) over physics preference pairs, guiding the model toward outputs with higher physical plausibility. To evaluate PhyWorld, we use both standard video-quality benchmarks and a dedicated physical-faithfulness benchmark with per-law scoring. Experiments show that PhyWorld improves video consistency, achieving an average score of 0.769 on VBench compared with 0.756 or below for state-of-the-art baselines. PhyWorld also improves physical plausibility, reaching an average score of 3.09 on our physical-faithfulness benchmark compared with 2.99 for the strongest baseline. These results suggest that post-training large video generation models with continuation and physics-preference signals can make them more effective world simulators for Physical AI.
Problem

Research questions and friction points this paper is trying to address.

world model
video generation
physical faithfulness
physics simulation
temporal coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

physics-faithful
video generation
world model
flow matching
Direct Preference Optimization
🔎 Similar Papers
No similar papers found.
Pu Zhao
Pu Zhao
Northeastern University
Adversarial learningimage classificationdeep learning
J
Juyi Lin
Northeastern University
T
Timothy Rupprecht
Northeastern University
A
Arash Akbari
Northeastern University
C
Chence Yang
University of Georgia
R
Rahul Chowdhury
Northeastern University
E
Elaheh Motamedi
Northeastern University
A
Arman Akbari
Northeastern University
Y
Yumei He
Tulane University
C
Chen Wang
EmbodyX
Geng Yuan
Geng Yuan
University of Georgia
Efficient AIExplainable AITrustworthy MLEdge ComputingAI Applications
W
Weiwei Chen
EmbodyX
Yanzhi Wang
Yanzhi Wang
Northeastern University
Energy efficient and High Performance Deep LearningNeuromorphic ComputingLow-Power Circuits and