PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-video generation models frequently violate physical laws, resulting in motion distortions and incoherent object interactions—limiting their deployment in high-reliability applications such as embodied AI, robotics, and simulation. To address this, we propose PhysicsRM, the first dual-dimensional physics reward model that separately quantifies *intra-object stability* and *inter-object dynamical interaction*. Building upon it, we introduce PhyDPO—a physics-aware direct preference optimization framework that enables model-agnostic, scalable alignment with physical consistency. PhyDPO integrates a dual-reward mechanism, contrastive feedback learning, and physics-guided reweighting, and is compatible with both video diffusion models and video Transformers. Extensive experiments demonstrate that our framework significantly improves physical plausibility across multiple benchmarks while preserving visual quality and semantic fidelity. This work establishes a new paradigm for trustworthy, physically grounded video generation.

Technology Category

Application Category

📝 Abstract
Recent advances in text-to-video generation have achieved impressive perceptual quality, yet generated content often violates fundamental principles of physical plausibility - manifesting as implausible object dynamics, incoherent interactions, and unrealistic motion patterns. Such failures hinder the deployment of video generation models in embodied AI, robotics, and simulation-intensive domains. To bridge this gap, we propose PhysCorr, a unified framework for modeling, evaluating, and optimizing physical consistency in video generation. Specifically, we introduce PhysicsRM, the first dual-dimensional reward model that quantifies both intra-object stability and inter-object interactions. On this foundation, we develop PhyDPO, a novel direct preference optimization pipeline that leverages contrastive feedback and physics-aware reweighting to guide generation toward physically coherent outputs. Our approach is model-agnostic and scalable, enabling seamless integration into a wide range of video diffusion and transformer-based backbones. Extensive experiments across multiple benchmarks demonstrate that PhysCorr achieves significant improvements in physical realism while preserving visual fidelity and semantic alignment. This work takes a critical step toward physically grounded and trustworthy video generation.
Problem

Research questions and friction points this paper is trying to address.

Addresses physically implausible object dynamics in generated videos
Improves coherence of object interactions and motion patterns
Enhances physical consistency while preserving visual quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-dimensional reward model for physics evaluation
Direct preference optimization with physics-aware reweighting
Model-agnostic framework for video generation backbones
🔎 Similar Papers
No similar papers found.
Peiyao Wang
Peiyao Wang
Stony Brook University
computer vision
W
Weining Wang
Institute of Automation, Chinese Academy of Sciences
Q
Qi Li
Institute of Automation, Chinese Academy of Sciences