Measures of Variability for Risk-averse Policy Gradient

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the underexplored problem of variability measurement in risk-averse reinforcement learning (RARL). We systematically evaluate nine common variability metrics and, for the first time, integrate four non-variance-based measures—CVaR deviation, Gini deviation, semistandard deviation, and mean deviation—into policy gradient frameworks. Theoretically, we derive analytical policy gradient expressions for four previously unexamined metrics and improve the gradient estimator for Gini deviation. Methodologically, we embed these metrics into REINFORCE and PPO to explicitly penalize return dispersion. Empirical results across multiple stochastic environments demonstrate that CVaR and Gini deviations achieve both high expected returns and strong robustness; mean and semistandard deviations also exhibit competitive performance. Our work establishes a rigorous theoretical connection between variability quantification and policy optimization, significantly enhancing the reliability and generalization capability of risk-sensitive decision-making.

Technology Category

Application Category

📝 Abstract

Risk-averse reinforcement learning (RARL) is critical for decision-making under uncertainty, which is especially valuable in high-stake applications. However, most existing works focus on risk measures, e.g., conditional value-at-risk (CVaR), while measures of variability remain underexplored. In this paper, we comprehensively study nine common measures of variability, namely Variance, Gini Deviation, Mean Deviation, Mean-Median Deviation, Standard Deviation, Inter-Quantile Range, CVaR Deviation, Semi_Variance, and Semi_Standard Deviation. Among them, four metrics have not been previously studied in RARL. We derive policy gradient formulas for these unstudied metrics, improve gradient estimation for Gini Deviation, analyze their gradient properties, and incorporate them with the REINFORCE and PPO frameworks to penalize the dispersion of returns. Our empirical study reveals that variance-based metrics lead to unstable policy updates. In contrast, CVaR Deviation and Gini Deviation show consistent performance across different randomness and evaluation domains, achieving high returns while effectively learning risk-averse policies. Mean Deviation and Semi_Standard Deviation are also competitive across different scenarios. This work provides a comprehensive overview of variability measures in RARL, offering practical insights for risk-aware decision-making and guiding future research on risk metrics and RARL algorithms.

Problem

Research questions and friction points this paper is trying to address.

Studies nine variability measures for risk-averse reinforcement learning.

Derives policy gradients for four unexplored risk metrics in RARL.

Evaluates performance of variability measures in decision-making under uncertainty.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Derives policy gradient formulas for new variability measures

Improves gradient estimation for Gini Deviation

Incorporates measures with REINFORCE and PPO frameworks

🔎 Similar Papers

Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence