🤖 AI Summary
This work addresses the underexplored problem of variability measurement in risk-averse reinforcement learning (RARL). We systematically evaluate nine common variability metrics and, for the first time, integrate four non-variance-based measures—CVaR deviation, Gini deviation, semistandard deviation, and mean deviation—into policy gradient frameworks. Theoretically, we derive analytical policy gradient expressions for four previously unexamined metrics and improve the gradient estimator for Gini deviation. Methodologically, we embed these metrics into REINFORCE and PPO to explicitly penalize return dispersion. Empirical results across multiple stochastic environments demonstrate that CVaR and Gini deviations achieve both high expected returns and strong robustness; mean and semistandard deviations also exhibit competitive performance. Our work establishes a rigorous theoretical connection between variability quantification and policy optimization, significantly enhancing the reliability and generalization capability of risk-sensitive decision-making.
📝 Abstract
Risk-averse reinforcement learning (RARL) is critical for decision-making under uncertainty, which is especially valuable in high-stake applications. However, most existing works focus on risk measures, e.g., conditional value-at-risk (CVaR), while measures of variability remain underexplored. In this paper, we comprehensively study nine common measures of variability, namely Variance, Gini Deviation, Mean Deviation, Mean-Median Deviation, Standard Deviation, Inter-Quantile Range, CVaR Deviation, Semi_Variance, and Semi_Standard Deviation. Among them, four metrics have not been previously studied in RARL. We derive policy gradient formulas for these unstudied metrics, improve gradient estimation for Gini Deviation, analyze their gradient properties, and incorporate them with the REINFORCE and PPO frameworks to penalize the dispersion of returns. Our empirical study reveals that variance-based metrics lead to unstable policy updates. In contrast, CVaR Deviation and Gini Deviation show consistent performance across different randomness and evaluation domains, achieving high returns while effectively learning risk-averse policies. Mean Deviation and Semi_Standard Deviation are also competitive across different scenarios. This work provides a comprehensive overview of variability measures in RARL, offering practical insights for risk-aware decision-making and guiding future research on risk metrics and RARL algorithms.