Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Offline decision-making struggles to generate dynamically feasible trajectories, as existing methods often produce infeasible behaviors by neglecting system dynamics. To address this, we propose MPDiffuser—a novel framework introducing an alternating diffusion sampling mechanism between a task planner and a dynamics model, enabling joint optimization of task-objective alignment and dynamics consistency without environment interaction. Its modular architecture (planner–dynamics–ranker) integrates diffusion modeling, model predictive control, dynamic constraint encoding, and learning-to-rank. We theoretically characterize the trade-off between data priors and dynamics consistency, ensuring robust generalization from low-quality data and rapid adaptation to new dynamics. MPDiffuser achieves state-of-the-art performance on D4RL and DSRL benchmarks and demonstrates end-to-end vision-driven control efficacy in real-world deployment on a quadrupedal robot.

Technology Category

Application Category

📝 Abstract
Offline decision-making requires synthesizing reliable behaviors from fixed datasets without further interaction, yet existing generative approaches often yield trajectories that are dynamically infeasible. We propose Model Predictive Diffuser (MPDiffuser), a compositional model-based diffusion framework consisting of: (i) a planner that generates diverse, task-aligned trajectories; (ii) a dynamics model that enforces consistency with the underlying system dynamics; and (iii) a ranker module that selects behaviors aligned with the task objectives. MPDiffuser employs an alternating diffusion sampling scheme, where planner and dynamics updates are interleaved to progressively refine trajectories for both task alignment and feasibility during the sampling process. We also provide a theoretical rationale for this procedure, showing how it balances fidelity to data priors with dynamics consistency. Empirically, the compositional design improves sample efficiency, as it leverages even low-quality data for dynamics learning and adapts seamlessly to novel dynamics. We evaluate MPDiffuser on both unconstrained (D4RL) and constrained (DSRL) offline decision-making benchmarks, demonstrating consistent gains over existing approaches. Furthermore, we present a preliminary study extending MPDiffuser to vision-based control tasks, showing its potential to scale to high-dimensional sensory inputs. Finally, we deploy our method on a real quadrupedal robot, showcasing its practicality for real-world control.
Problem

Research questions and friction points this paper is trying to address.

Generates feasible trajectories from offline data
Balances task alignment with dynamic consistency
Scales to high-dimensional and real-world control tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compositional diffusion framework with planner, dynamics model, and ranker
Alternating diffusion sampling for task alignment and feasibility
Leverages low-quality data and adapts to novel dynamics
🔎 Similar Papers
No similar papers found.