Risk-sensitive reinforcement learning using expectiles, shortfall risk and optimized certainty equivalent risk

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of traditional reinforcement learning in modeling risk sensitivity, which hinders its applicability in high-stakes scenarios. Focusing on finite-horizon Markov decision processes, the paper presents the first unified treatment of three distinct risk measures—expectile, utility-based shortfall risk, and optimized certainty equivalent—deriving policy gradient theorems for each and designing corresponding gradient estimation algorithms. By leveraging trajectory sampling and smoothness analysis, the authors establish a mean-squared error bound of $O(1/m)$ for the gradient estimators and demonstrate a stable convergence rate for the proposed algorithms. Both theoretical analysis and empirical evaluations on standard reinforcement learning benchmarks confirm the effectiveness and superiority of the developed risk-sensitive policy gradient methods.

Technology Category

Application Category

📝 Abstract
We propose risk-sensitive reinforcement learning algorithms catering to three families of risk measures, namely expectiles, utility-based shortfall risk and optimized certainty equivalent risk. For each risk measure, in the context of a finite horizon Markov decision process, we first derive a policy gradient theorem. Second, we propose estimators of the risk-sensitive policy gradient for each of the aforementioned risk measures, and establish $\mathcal{O}\left(1/m\right)$ mean-squared error bounds for our estimators, where $m$ is the number of trajectories. Further, under standard assumptions for policy gradient-type algorithms, we establish smoothness of the risk-sensitive objective, in turn leading to stationary convergence rate bounds for the overall risk-sensitive policy gradient algorithm that we propose. Finally, we conduct numerical experiments to validate the theoretical findings on popular RL benchmarks.
Problem

Research questions and friction points this paper is trying to address.

risk-sensitive reinforcement learning
expectiles
shortfall risk
optimized certainty equivalent
policy gradient
Innovation

Methods, ideas, or system contributions that make the work stand out.

risk-sensitive reinforcement learning
policy gradient theorem
expectiles
shortfall risk
optimized certainty equivalent
🔎 Similar Papers
No similar papers found.
Sumedh Gupte
Sumedh Gupte
Researcher at TCS Research
risk-sensitive learningstochastic optimization
S
Shrey Rakeshkumar Patel
Indian Institute of Technology Madras
S
Soumen Pachal
Indian Institute of Technology Madras
P
Prashanth L. A.
Indian Institute of Technology Madras
Sanjay P. Bhat
Sanjay P. Bhat
Tata Consultancy Services Limited
Nonlinear dynamical systemsstability theoryattitude motionmathematical finance