Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Hybrid robot dynamics modeling suffers from excessive smoothing in conventional world models due to tight coupling between continuous motion and discrete events (e.g., contact, impact), leading to accumulated errors in long-horizon planning. To address this, we propose a structured latent world model that integrates a context-aware Mixture-of-Experts (MoE) architecture with an implicit dynamic pattern recognition gating mechanism. Orthogonalization constraints in the latent space enhance expert diversity, enabling disentangled representation and adaptive composition of distinct physical modes (e.g., sliding/adhesion, flight/standing). The model enables high-fidelity trajectory rollout and significantly suppresses rollout drift in high-dimensional humanoid robots and multi-task settings. It provides a robust dynamical foundation for model-based planning algorithms such as TD-MPC, demonstrating its efficacy as a core modeling component for next-generation intelligent agents.

Technology Category

Application Category

📝 Abstract

Model-based planning in robotic domains is fundamentally challenged by the hybrid nature of physical dynamics, where continuous motion is punctuated by discrete events such as contacts and impacts. Conventional latent world models typically employ monolithic neural networks that enforce global continuity, inevitably over-smoothing the distinct dynamic modes (e.g., sticking vs. sliding, flight vs. stance). For a planner, this smoothing results in catastrophic compounding errors during long-horizon lookaheads, rendering the search process unreliable at physical boundaries. To address this, we introduce the Prismatic World Model (PRISM-WM), a structured architecture designed to decompose complex hybrid dynamics into composable primitives. PRISM-WM leverages a context-aware Mixture-of-Experts (MoE) framework where a gating mechanism implicitly identifies the current physical mode, and specialized experts predict the associated transition dynamics. We further introduce a latent orthogonalization objective to ensure expert diversity, effectively preventing mode collapse. By accurately modeling the sharp mode transitions in system dynamics, PRISM-WM significantly reduces rollout drift. Extensive experiments on challenging continuous control benchmarks, including high-dimensional humanoids and diverse multi-task settings, demonstrate that PRISM-WM provides a superior high-fidelity substrate for trajectory optimization algorithms (e.g., TD-MPC), proving its potential as a powerful foundational model for next-generation model-based agents.

Problem

Research questions and friction points this paper is trying to address.

Model-based planning in hybrid robotic systems with discrete events

Over-smoothing of distinct dynamic modes by monolithic neural networks

Catastrophic compounding errors during long-horizon lookaheads in planners

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Mixture-of-Experts to decompose hybrid dynamics

Employs latent orthogonalization to prevent mode collapse

Provides high-fidelity substrate for trajectory optimization

🔎 Similar Papers

Deep hybrid models: infer and plan in a dynamic world