Predictive Lagrangian Optimization for Constrained Reinforcement Learning

📅 2025-01-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of joint optimization between constraint embedding and policy learning in constrained reinforcement learning (CRL), this paper establishes a unified equivalence framework bridging CRL and feedback control. Specifically, Lagrange multiplier updates are reformulated as an optimal feedback control problem, and a multiplier-guided policy learning mechanism is introduced to enable end-to-end co-optimization. Theoretically, we show that PID-Lagrangian methods constitute only a special case within this broader framework. Methodologically, we pioneer the integration of model predictive control (MPC) into Lagrangian optimization, proposing Predictive Lagrangian Optimization (PLO)—a novel paradigm for adaptive constraint handling. Evaluated on a multi-task constrained RL benchmark, PLO significantly expands the feasible policy region (+7.2%) while preserving average reward performance, demonstrating its effectiveness, generalizability, and robustness.

Technology Category

Application Category

📝 Abstract
Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks. From the perspective of dynamic system, iteratively solving a constrained optimization problem can be framed as the temporal evolution of a feedback control system. Classical constrained optimization methods, such as penalty and Lagrangian approaches, inherently use proportional and integral feedback controllers. In this paper, we propose a more generic equivalence framework to build the connection between constrained optimization and feedback control system, for the purpose of developing more effective constrained RL algorithms. Firstly, we define that each step of the system evolution determines the Lagrange multiplier by solving a multiplier feedback optimal control problem (MFOCP). In this problem, the control input is multiplier, the state is policy parameters, the dynamics is described by policy gradient descent, and the objective is to minimize constraint violations. Then, we introduce a multiplier guided policy learning (MGPL) module to perform policy parameters updating. And we prove that the resulting optimal policy, achieved through alternating MFOCP and MGPL, aligns with the solution of the primal constrained RL problem, thereby establishing our equivalence framework. Furthermore, we point out that the existing PID Lagrangian is merely one special case within our framework that utilizes a PID controller. We also accommodate the integration of other various feedback controllers, thereby facilitating the development of new algorithms. As a representative, we employ model predictive control (MPC) as the feedback controller and consequently propose a new algorithm called predictive Lagrangian optimization (PLO). Numerical experiments demonstrate its superiority over the PID Lagrangian method, achieving a larger feasible region up to 7.2% and a comparable average reward.
Problem

Research questions and friction points this paper is trying to address.

Constrained Reinforcement Learning
Rule Integration
Complex Task Solving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
Constrained Optimization
MPC Controller
🔎 Similar Papers
No similar papers found.
Tianqi Zhang
Tianqi Zhang
Oak Ridge National Laboratory
remote sensingforest structurewildfire riskclimate modelingmachine learning
P
Puzhen Yuan
Xingjian College, Tsinghua University, Beijing, 100084, China
G
Guojian Zhan
School of Vehicle and Mobility, Tsinghua University, Beijing, 100084, China
Ziyu Lin
Ziyu Lin
National University of Singapore, Singapore Management University
Network Security/Web Security/System Security
Y
Yao Lyu
School of Vehicle and Mobility, Tsinghua University, Beijing, 100084, China
Z
Zhenzhi Qin
Department of Mathematical Sciences, Tsinghua University, Beijing, 100084
Jingliang Duan
Jingliang Duan
University of Science and Technology Beijing
L
Liping Zhang
Department of Mathematical Sciences, Tsinghua University, Beijing, 100084
S
Shengbo Eben Li
School of Vehicle and Mobility, Tsinghua University, Beijing, 100084, China; College of Artificial Intelligence, Tsinghua University, Beijing, 100084, China