PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models

📅 2026-01-16

📈 Citations: 1

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work addresses the lack of physical plausibility in existing Transformer-based video generation models, which often disregard rigid-body physics during pixel-level denoising, leading to unrealistic behaviors in collision scenarios. To overcome this limitation, we propose a physics-aware reinforcement learning paradigm that, for the first time, explicitly embeds Newtonian mechanics–driven collision rules as reinforcement signals directly into the high-dimensional generative space, rather than imposing them as post-hoc constraints. We introduce the Mimicry-Discovery Cycle (MDcycle), a unified framework that preserves physical feedback during large-scale fine-tuning, enabling co-optimization of physical fidelity and generative flexibility. Experiments on the newly established PhysRVGBench benchmark demonstrate that our approach significantly outperforms current methods in both physical realism and rigid-body motion consistency.

Technology Category

Application Category

📝 Abstract

Physical principles are fundamental to realistic visual simulation, but remain a significant oversight in transformer-based video generation. This gap highlights a critical limitation in rendering rigid body motion, a core tenet of classical mechanics. While computer graphics and physics-based simulators can easily model such collisions using Newton formulas, modern pretrain-finetune paradigms discard the concept of object rigidity during pixel-level global denoising. Even perfectly correct mathematical constraints are treated as suboptimal solutions (i.e., conditions) during model optimization in post-training, fundamentally limiting the physical realism of generated videos. Motivated by these considerations, we introduce, for the first time, a physics-aware reinforcement learning paradigm for video generation models that enforces physical collision rules directly in high-dimensional spaces, ensuring the physics knowledge is strictly applied rather than treated as conditions. Subsequently, we extend this paradigm to a unified framework, termed Mimicry-Discovery Cycle (MDcycle), which allows substantial fine-tuning while fully preserving the model's ability to leverage physics-grounded feedback. To validate our approach, we construct new benchmark PhysRVGBench and perform extensive qualitative and quantitative experiments to thoroughly assess its effectiveness.

Problem

Research questions and friction points this paper is trying to address.

physics-aware

video generation

rigid body motion

physical realism

collision modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

physics-aware reinforcement learning

video generative models

rigid body motion