Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments

📅 2025-04-03

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

Current video generation models lack physical consistency, hindering the trustworthy development of embodied AI. Method: We introduce PhysEval—the first benchmark grounded in real-world physics experiments—comprising 80 empirically captured videos of diverse physical phenomena. We propose a no-ground-truth evaluation paradigm anchored in inviolable physical conservation laws (e.g., energy and momentum), and design a differentiable physical plausibility metric that synergistically integrates Physics-Informed Neural Networks (PINNs) with vision-language models. Contribution/Results: Experiments reveal that state-of-the-art models, while visually compelling, exhibit substantial physical inconsistencies. To foster reproducibility and community advancement, we open-source all benchmark data, implementation code, and a live leaderboard—establishing a new, rigorous, and publicly accessible standard for evaluating world-modeling capabilities in generative video systems.

Technology Category

Application Category

📝 Abstract

Recent advances in image and video generation raise hopes that these models possess world modeling capabilities, the ability to generate realistic, physically plausible videos. This could revolutionize applications in robotics, autonomous driving, and scientific simulation. However, before treating these models as world models, we must ask: Do they adhere to physical conservation laws? To answer this, we introduce Morpheus, a benchmark for evaluating video generation models on physical reasoning. It features 80 real-world videos capturing physical phenomena, guided by conservation laws. Since artificial generations lack ground truth, we assess physical plausibility using physics-informed metrics evaluated with respect to infallible conservation laws known per physical setting, leveraging advances in physics-informed neural networks and vision-language foundation models. Our findings reveal that even with advanced prompting and video conditioning, current models struggle to encode physical principles despite generating aesthetically pleasing videos. All data, leaderboard, and code are open-sourced at our project page.

Problem

Research questions and friction points this paper is trying to address.

Assessing video generative models' adherence to physical laws

Evaluating physical plausibility using physics-informed metrics

Benchmarking models' ability to generate realistic physical phenomena

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark with real-world physics videos

Physics-informed metrics for evaluation

Leverage vision-language foundation models

🔎 Similar Papers

Video-Driven Graph Network-Based Simulators