🤖 AI Summary
To address the challenge that model predictive control (MPC) in offline reinforcement learning struggles to adapt to novel reward functions and non-stationary dynamics, this paper proposes Diffusion Model Predictive Control (D-MPC). D-MPC is the first method to unify diffusion models for both multi-step action generation and multi-step dynamics modeling, jointly optimizing both components within online MPC. It enables zero-shot reward reconfiguration and dynamic adaptation, eliminating reliance on pre-specified rewards or fixed dynamics models. Technically, it integrates diffusion-based modeling, uncertainty-aware sequential generation, and model-based planning. On the D4RL benchmark, D-MPC significantly outperforms state-of-the-art model-based offline planning methods such as MBOP, while matching the performance of current top-tier model-based and model-free RL algorithms.
📝 Abstract
We propose Diffusion Model Predictive Control (D-MPC), a novel MPC approach that learns a multi-step action proposal and a multi-step dynamics model, both using diffusion models, and combines them for use in online MPC. On the popular D4RL benchmark, we show performance that is significantly better than existing model-based offline planning methods using MPC (e.g. MBOP) and competitive with state-of-the-art (SOTA) model-based and model-free reinforcement learning methods. We additionally illustrate D-MPC's ability to optimize novel reward functions at run time and adapt to novel dynamics, and highlight its advantages compared to existing diffusion-based planning baselines.