Risk-sensitive reinforcement learning using expectiles, shortfall risk and optimized certainty equivalent risk

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the limitation of traditional reinforcement learning in modeling risk sensitivity, which hinders its applicability in high-stakes scenarios. Focusing on finite-horizon Markov decision processes, the paper presents the first unified treatment of three distinct risk measures—expectile, utility-based shortfall risk, and optimized certainty equivalent—deriving policy gradient theorems for each and designing corresponding gradient estimation algorithms. By leveraging trajectory sampling and smoothness analysis, the authors establish a mean-squared error bound of $O(1/m)$ for the gradient estimators and demonstrate a stable convergence rate for the proposed algorithms. Both theoretical analysis and empirical evaluations on standard reinforcement learning benchmarks confirm the effectiveness and superiority of the developed risk-sensitive policy gradient methods.

Technology Category

Application Category

📝 Abstract

We propose risk-sensitive reinforcement learning algorithms catering to three families of risk measures, namely expectiles, utility-based shortfall risk and optimized certainty equivalent risk. For each risk measure, in the context of a finite horizon Markov decision process, we first derive a policy gradient theorem. Second, we propose estimators of the risk-sensitive policy gradient for each of the aforementioned risk measures, and establish $\mathcal{O}\left(1/m\right)$ mean-squared error bounds for our estimators, where $m$ is the number of trajectories. Further, under standard assumptions for policy gradient-type algorithms, we establish smoothness of the risk-sensitive objective, in turn leading to stationary convergence rate bounds for the overall risk-sensitive policy gradient algorithm that we propose. Finally, we conduct numerical experiments to validate the theoretical findings on popular RL benchmarks.

Problem

Research questions and friction points this paper is trying to address.

risk-sensitive reinforcement learning

expectiles

shortfall risk

optimized certainty equivalent

policy gradient

Innovation

Methods, ideas, or system contributions that make the work stand out.

risk-sensitive reinforcement learning

policy gradient theorem

expectiles