PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

📅 2025-09-24

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Existing video generation models produce high-fidelity videos but often lack physical plausibility and 3D controllability. To address this, we propose a physics-anchored image-to-video generation framework. Our method introduces a generative physics network that explicitly models multi-material dynamics—including elastic bodies, granular media (e.g., sand), viscoelastic putty, and rigid bodies—alongside a spatiotemporal attention module to capture inter-particle interactions. We jointly optimize trajectory plausibility and visual quality via a composite loss incorporating physics-based constraints. Furthermore, we employ a diffusion model to synthesize physically consistent 3D point trajectories, which drive controllable video synthesis. Trained on 550K synthetic samples, our approach surpasses state-of-the-art methods in both physical plausibility and visual fidelity. It enables fine-grained dynamic editing guided by physical parameters (e.g., elasticity, friction) and external forces (e.g., gravity, impact), offering unprecedented control over physically grounded video generation.

Technology Category

Application Category

📝 Abstract

Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability. To overcome these limitations, we introduce PhysCtrl, a novel framework for physics-grounded image-to-video generation with physical parameters and force control. At its core is a generative physics network that learns the distribution of physical dynamics across four materials (elastic, sand, plasticine, and rigid) via a diffusion model conditioned on physics parameters and applied forces. We represent physical dynamics as 3D point trajectories and train on a large-scale synthetic dataset of 550K animations generated by physics simulators. We enhance the diffusion model with a novel spatiotemporal attention block that emulates particle interactions and incorporates physics-based constraints during training to enforce physical plausibility. Experiments show that PhysCtrl generates realistic, physics-grounded motion trajectories which, when used to drive image-to-video models, yield high-fidelity, controllable videos that outperform existing methods in both visual quality and physical plausibility. Project Page: https://cwchenwang.github.io/physctrl

Problem

Research questions and friction points this paper is trying to address.

Overcoming lack of physical plausibility in video generation models

Addressing limited 3D controllability in existing video generation methods

Generating physics-grounded motion with parameter and force control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative physics network with diffusion model

Spatiotemporal attention block for particle interactions

Physics-based constraints for physical plausibility

🔎 Similar Papers

No similar papers found.

Authors to Follow