Risk-Averse Constrained Reinforcement Learning with Optimized Certainty Equivalents

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing constrained reinforcement learning (CRL) methods for high-risk scenarios neglect tail risks—such as catastrophic events—in the reward distribution. Method: We propose a risk-aware CRL framework grounded in Optimized Certainty Equivalent (OCE), integrating OCE into constrained optimization to jointly ensure robustness over both reward support and temporal horizons, while preserving exact problem equivalence under parametric strong Lagrangian duality. The algorithm is compatible with mainstream RL solvers (e.g., PPO) and enjoys provable convergence under standard assumptions. Contribution/Results: This work is the first to systematically incorporate OCE into both modeling and optimization of constrained RL, enabling simultaneous tail-risk mitigation and stage-wise robustness guarantees. Empirical evaluation across diverse tasks demonstrates significant suppression of extreme-risk events, outperforming state-of-the-art risk-sensitive and constrained RL approaches.

Technology Category

Application Category

📝 Abstract
Constrained optimization provides a common framework for dealing with conflicting objectives in reinforcement learning (RL). In most of these settings, the objectives (and constraints) are expressed though the expected accumulated reward. However, this formulation neglects risky or even possibly catastrophic events at the tails of the reward distribution, and is often insufficient for high-stakes applications in which the risk involved in outliers is critical. In this work, we propose a framework for risk-aware constrained RL, which exhibits per-stage robustness properties jointly in reward values and time using optimized certainty equivalents (OCEs). Our framework ensures an exact equivalent to the original constrained problem within a parameterized strong Lagrangian duality framework under appropriate constraint qualifications, and yields a simple algorithmic recipe which can be wrapped around standard RL solvers, such as PPO. Lastly, we establish the convergence of the proposed algorithm under common assumptions, and verify the risk-aware properties of our approach through several numerical experiments.
Problem

Research questions and friction points this paper is trying to address.

Addresses risk in reinforcement learning by considering reward distribution tails
Provides constrained optimization framework for conflicting objectives with robustness
Ensures safety in high-stakes applications where outlier risks are critical
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses optimized certainty equivalents for risk awareness
Ensures exact equivalence via strong Lagrangian duality
Wraps around standard reinforcement learning solvers