Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

249K/year

🤖 AI Summary

Existing world models often lack physical plausibility, action controllability, and long-term stability, limiting their applicability in embodied intelligence. This work proposes a Hamiltonian World Model that encodes observations into a structured latent phase space and evolves system states through Hamiltonian dynamics augmented with control inputs, dissipative mechanisms, and residual corrections, thereby enabling physically consistent and controllable predictions. The approach substantially enhances the model’s physical consistency, interpretability, data efficiency, and long-horizon stability, while supporting trajectory-rollout-based planning. Experimental results demonstrate that the proposed model provides a more reliable and intervenable predictive foundation for embodied intelligence.

📝 Abstract

World models have recently re-emerged as a central paradigm for embodied intelligence, robotics, autonomous driving, and model-based reinforcement learning. However, current world model research is often dominated by three partially separated routes: 2D video-generative models that emphasize visual future synthesis, 3D scene-centric models that emphasize spatial reconstruction, and JEPA-like latent models that emphasize abstract predictive representations. While each route has made important progress, they still struggle to provide physically reliable, action-controllable, and long-horizon stable predictions for embodied decision making. In this paper, we argue that the bottleneck of world models is no longer only whether they can generate realistic futures, but whether those futures are physically meaningful and useful for action. We propose \emph{Hamiltonian World Models} as a physically grounded perspective on world modeling. The key idea is to encode observations into a structured latent phase space, evolve the latent state through Hamiltonian-inspired dynamics with control, dissipation, and residual terms, decode the predicted trajectory into future observations, and use the resulting rollouts for planning. We discuss how Hamiltonian structure may improve interpretability, data efficiency, and long-horizon stability, while also noting practical challenges in real-world robotic scenes involving friction, contact, non-conservative forces, and deformable objects.

Problem

Research questions and friction points this paper is trying to address.

world models

physical reliability

action-controllable prediction

long-horizon stability

embodied intelligence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hamiltonian dynamics

world models

physically grounded AI