🤖 AI Summary
Hybrid robot dynamics modeling suffers from excessive smoothing in conventional world models due to tight coupling between continuous motion and discrete events (e.g., contact, impact), leading to accumulated errors in long-horizon planning. To address this, we propose a structured latent world model that integrates a context-aware Mixture-of-Experts (MoE) architecture with an implicit dynamic pattern recognition gating mechanism. Orthogonalization constraints in the latent space enhance expert diversity, enabling disentangled representation and adaptive composition of distinct physical modes (e.g., sliding/adhesion, flight/standing). The model enables high-fidelity trajectory rollout and significantly suppresses rollout drift in high-dimensional humanoid robots and multi-task settings. It provides a robust dynamical foundation for model-based planning algorithms such as TD-MPC, demonstrating its efficacy as a core modeling component for next-generation intelligent agents.
📝 Abstract
Model-based planning in robotic domains is fundamentally challenged by the hybrid nature of physical dynamics, where continuous motion is punctuated by discrete events such as contacts and impacts. Conventional latent world models typically employ monolithic neural networks that enforce global continuity, inevitably over-smoothing the distinct dynamic modes (e.g., sticking vs. sliding, flight vs. stance). For a planner, this smoothing results in catastrophic compounding errors during long-horizon lookaheads, rendering the search process unreliable at physical boundaries. To address this, we introduce the Prismatic World Model (PRISM-WM), a structured architecture designed to decompose complex hybrid dynamics into composable primitives. PRISM-WM leverages a context-aware Mixture-of-Experts (MoE) framework where a gating mechanism implicitly identifies the current physical mode, and specialized experts predict the associated transition dynamics. We further introduce a latent orthogonalization objective to ensure expert diversity, effectively preventing mode collapse. By accurately modeling the sharp mode transitions in system dynamics, PRISM-WM significantly reduces rollout drift. Extensive experiments on challenging continuous control benchmarks, including high-dimensional humanoids and diverse multi-task settings, demonstrate that PRISM-WM provides a superior high-fidelity substrate for trajectory optimization algorithms (e.g., TD-MPC), proving its potential as a powerful foundational model for next-generation model-based agents.