🤖 AI Summary
Offline decision-making struggles to generate dynamically feasible trajectories, as existing methods often produce infeasible behaviors by neglecting system dynamics. To address this, we propose MPDiffuser—a novel framework introducing an alternating diffusion sampling mechanism between a task planner and a dynamics model, enabling joint optimization of task-objective alignment and dynamics consistency without environment interaction. Its modular architecture (planner–dynamics–ranker) integrates diffusion modeling, model predictive control, dynamic constraint encoding, and learning-to-rank. We theoretically characterize the trade-off between data priors and dynamics consistency, ensuring robust generalization from low-quality data and rapid adaptation to new dynamics. MPDiffuser achieves state-of-the-art performance on D4RL and DSRL benchmarks and demonstrates end-to-end vision-driven control efficacy in real-world deployment on a quadrupedal robot.
📝 Abstract
Offline decision-making requires synthesizing reliable behaviors from fixed datasets without further interaction, yet existing generative approaches often yield trajectories that are dynamically infeasible. We propose Model Predictive Diffuser (MPDiffuser), a compositional model-based diffusion framework consisting of: (i) a planner that generates diverse, task-aligned trajectories; (ii) a dynamics model that enforces consistency with the underlying system dynamics; and (iii) a ranker module that selects behaviors aligned with the task objectives. MPDiffuser employs an alternating diffusion sampling scheme, where planner and dynamics updates are interleaved to progressively refine trajectories for both task alignment and feasibility during the sampling process. We also provide a theoretical rationale for this procedure, showing how it balances fidelity to data priors with dynamics consistency. Empirically, the compositional design improves sample efficiency, as it leverages even low-quality data for dynamics learning and adapts seamlessly to novel dynamics. We evaluate MPDiffuser on both unconstrained (D4RL) and constrained (DSRL) offline decision-making benchmarks, demonstrating consistent gains over existing approaches. Furthermore, we present a preliminary study extending MPDiffuser to vision-based control tasks, showing its potential to scale to high-dimensional sensory inputs. Finally, we deploy our method on a real quadrupedal robot, showcasing its practicality for real-world control.