MPCritic: A plug-and-play MPC architecture for reinforcement learning

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address the high computational overhead, software integration complexity, and lack of end-to-end optimizability in RL–MPC fusion, this paper proposes MPCritic: a plug-and-play differentiable MPC architecture. Methodologically, it embeds MPC as a trainable module within the RL policy network and enables batch gradient updates of MPC parameters via a smoothed optimization loss landscape—eliminating online re-optimization and sensitivity computation. The MPC structure remains intact throughout training, with all parameters fully end-to-end differentiable, thereby jointly ensuring hard constraint satisfaction and efficient policy learning. MPCritic is compatible with mainstream RL algorithms (e.g., PPO, SAC) and MPC solvers (e.g., CasADi, ACADO). Evaluated on classical control benchmarks, it demonstrates broad applicability across linear/nonlinear MPC formulations and diverse RL paradigms. Results show significantly reduced integration complexity, enhanced real-time deployment safety, and improved training efficiency.

Technology Category

Application Category

📝 Abstract

The reinforcement learning (RL) and model predictive control (MPC) communities have developed vast ecosystems of theoretical approaches and computational tools for solving optimal control problems. Given their conceptual similarities but differing strengths, there has been increasing interest in synergizing RL and MPC. However, existing approaches tend to be limited for various reasons, including computational cost of MPC in an RL algorithm and software hurdles towards seamless integration of MPC and RL tools. These challenges often result in the use of"simple"MPC schemes or RL algorithms, neglecting the state-of-the-art in both areas. This paper presents MPCritic, a machine learning-friendly architecture that interfaces seamlessly with MPC tools. MPCritic utilizes the loss landscape defined by a parameterized MPC problem, focusing on"soft"optimization over batched training steps; thereby updating the MPC parameters while avoiding costly minimization and parametric sensitivities. Since the MPC structure is preserved during training, an MPC agent can be readily used for online deployment, where robust constraint satisfaction is paramount. We demonstrate the versatility of MPCritic, in terms of MPC architectures and RL algorithms that it can accommodate, on classic control benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Integrates RL and MPC for optimal control synergy

Reduces computational cost of MPC in RL

Enables seamless MPC-RL tool integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play MPC architecture for RL

Soft optimization over batched training

Preserves MPC structure for online deployment

🔎 Similar Papers

No similar papers found.