π€ AI Summary
Existing world models often lack physical plausibility, action controllability, and long-term stability, limiting their applicability in embodied intelligence. This work proposes a Hamiltonian World Model that encodes observations into a structured latent phase space and evolves system states through Hamiltonian dynamics augmented with control inputs, dissipative mechanisms, and residual corrections, thereby enabling physically consistent and controllable predictions. The approach substantially enhances the modelβs physical consistency, interpretability, data efficiency, and long-horizon stability, while supporting trajectory-rollout-based planning. Experimental results demonstrate that the proposed model provides a more reliable and intervenable predictive foundation for embodied intelligence.
π Abstract
World models have recently re-emerged as a central paradigm for embodied intelligence, robotics, autonomous driving, and model-based reinforcement learning. However, current world model research is often dominated by three partially separated routes: 2D video-generative models that emphasize visual future synthesis, 3D scene-centric models that emphasize spatial reconstruction, and JEPA-like latent models that emphasize abstract predictive representations. While each route has made important progress, they still struggle to provide physically reliable, action-controllable, and long-horizon stable predictions for embodied decision making. In this paper, we argue that the bottleneck of world models is no longer only whether they can generate realistic futures, but whether those futures are physically meaningful and useful for action. We propose \emph{Hamiltonian World Models} as a physically grounded perspective on world modeling. The key idea is to encode observations into a structured latent phase space, evolve the latent state through Hamiltonian-inspired dynamics with control, dissipation, and residual terms, decode the predicted trajectory into future observations, and use the resulting rollouts for planning. We discuss how Hamiltonian structure may improve interpretability, data efficiency, and long-horizon stability, while also noting practical challenges in real-world robotic scenes involving friction, contact, non-conservative forces, and deformable objects.