🤖 AI Summary
This work addresses the high computational cost of existing dense Transformer-based world models, which hinders real-time deployment. The authors propose DDP-WM, the first world model to incorporate dynamic decoupling, decomposing scene latent state evolution into sparse active dynamics—governing dominant physical interactions—and context-driven background updates. By integrating dynamic localization, cross-attention mechanisms, and efficient historical state processing, DDP-WM optimizes resource allocation and improves the optimization landscape for downstream planners. Evaluated on tasks such as Push-T, the method achieves approximately 9× faster inference and boosts model predictive control (MPC) success rates from 90% to 98%, significantly outperforming baseline approaches in both navigation and complex manipulation tasks.
📝 Abstract
World models are essential for autonomous robotic planning. However, the substantial computational overhead of existing dense Transformerbased models significantly hinders real-time deployment. To address this efficiency-performance bottleneck, we introduce DDP-WM, a novel world model centered on the principle of Disentangled Dynamics Prediction (DDP). We hypothesize that latent state evolution in observed scenes is heterogeneous and can be decomposed into sparse primary dynamics driven by physical interactions and secondary context-driven background updates. DDP-WM realizes this decomposition through an architecture that integrates efficient historical processing with dynamic localization to isolate primary dynamics. By employing a crossattention mechanism for background updates, the framework optimizes resource allocation and provides a smooth optimization landscape for planners. Extensive experiments demonstrate that DDP-WM achieves significant efficiency and performance across diverse tasks, including navigation, precise tabletop manipulation, and complex deformable or multi-body interactions. Specifically, on the challenging Push-T task, DDP-WM achieves an approximately 9 times inference speedup and improves the MPC success rate from 90% to98% compared to state-of-the-art dense models. The results establish a promising path for developing efficient, high-fidelity world models. Codes will be available at https://github.com/HCPLab-SYSU/DDP-WM.