DDP-WM: Disentangled Dynamics Prediction for Efficient World Models

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of existing dense Transformer-based world models, which hinders real-time deployment. The authors propose DDP-WM, the first world model to incorporate dynamic decoupling, decomposing scene latent state evolution into sparse active dynamics—governing dominant physical interactions—and context-driven background updates. By integrating dynamic localization, cross-attention mechanisms, and efficient historical state processing, DDP-WM optimizes resource allocation and improves the optimization landscape for downstream planners. Evaluated on tasks such as Push-T, the method achieves approximately 9× faster inference and boosts model predictive control (MPC) success rates from 90% to 98%, significantly outperforming baseline approaches in both navigation and complex manipulation tasks.

Technology Category

Application Category

📝 Abstract
World models are essential for autonomous robotic planning. However, the substantial computational overhead of existing dense Transformerbased models significantly hinders real-time deployment. To address this efficiency-performance bottleneck, we introduce DDP-WM, a novel world model centered on the principle of Disentangled Dynamics Prediction (DDP). We hypothesize that latent state evolution in observed scenes is heterogeneous and can be decomposed into sparse primary dynamics driven by physical interactions and secondary context-driven background updates. DDP-WM realizes this decomposition through an architecture that integrates efficient historical processing with dynamic localization to isolate primary dynamics. By employing a crossattention mechanism for background updates, the framework optimizes resource allocation and provides a smooth optimization landscape for planners. Extensive experiments demonstrate that DDP-WM achieves significant efficiency and performance across diverse tasks, including navigation, precise tabletop manipulation, and complex deformable or multi-body interactions. Specifically, on the challenging Push-T task, DDP-WM achieves an approximately 9 times inference speedup and improves the MPC success rate from 90% to98% compared to state-of-the-art dense models. The results establish a promising path for developing efficient, high-fidelity world models. Codes will be available at https://github.com/HCPLab-SYSU/DDP-WM.
Problem

Research questions and friction points this paper is trying to address.

world models
computational overhead
real-time deployment
efficiency-performance bottleneck
autonomous robotic planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangled Dynamics Prediction
World Models
Efficient Inference
Cross-Attention Mechanism
Model Predictive Control
🔎 Similar Papers
No similar papers found.
S
Shicheng Yin
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
K
Kaixuan Yin
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Weixing Chen
Weixing Chen
Sun Yat-sen University
CausalityEmbodied AIMedical Image Analysis
Y
Yang Liu
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
G
Guanbin Li
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Liang Lin
Liang Lin
Fellow of IEEE/IAPR, Professor of Computer Science, Sun Yat-sen University
Embodied AICausal Inference and LearningMultimodal Data Analysis