Soft MPCritic: Amortized Model Predictive Value Iteration

📅 2026-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the computational inefficiency often encountered when combining reinforcement learning with model predictive control (MPC) in large-scale settings. The authors propose an efficient policy synthesis framework that jointly learns in a soft value space, leveraging sampling-based planning for online control and value-target generation. An amortized warm-start mechanism reuses historical action sequences from prior planning iterations, while aligning the terminal Q-function with a short-horizon MPC to implicitly extend the effective planning horizon. Integrating model predictive path integral (MPPI) control, fitted value iteration, ensemble dynamics models, and scenario-based planning, the method demonstrates superior sample efficiency, robustness, and scalability across both classical and complex control tasks.
📝 Abstract
Reinforcement learning (RL) and model predictive control (MPC) offer complementary strengths, yet combining them at scale remains computationally challenging. We propose soft MPCritic, an RL-MPC framework that learns in (soft) value space while using sample-based planning for both online control and value target generation. soft MPCritic instantiates MPC through model predictive path integral control (MPPI) and trains a terminal Q-function with fitted value iteration, aligning the learned value function with the planner and implicitly extending the effective planning horizon. We introduce an amortized warm-start strategy that recycles planned open-loop action sequences from online observations when computing batched MPPI-based value targets. This makes soft MPCritic computationally practical, while preserving solution quality. soft MPCritic plans in a scenario-based fashion with an ensemble of dynamic models trained for next-step prediction accuracy. Together, these ingredients enable soft MPCritic to learn effectively through robust, short-horizon planning on classic and complex control tasks. These results establish soft MPCritic as a practical and scalable blueprint for synthesizing MPC policies in settings where policy extraction and direct, long-horizon planning may fail.
Problem

Research questions and friction points this paper is trying to address.

reinforcement learning
model predictive control
computational scalability
value function
planning horizon
Innovation

Methods, ideas, or system contributions that make the work stand out.

soft MPCritic
model predictive control
amortized planning
value iteration
MPPI
🔎 Similar Papers
No similar papers found.