Actor-Critic Algorithm for Dynamic Expectile and CVaR

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This work addresses the challenges of optimizing dynamic risk measures—such as expectiles and Conditional Value-at-Risk (CVaR)—under stochastic policies, where conventional policy gradient methods require perturbations to state transitions and value estimation typically relies on environment models. To overcome these limitations, the authors propose a novel policy gradient approach that eliminates the need for transition perturbations and leverages elicitable risk statistics to enable model-free value learning for dynamic risk. Building upon Expected SARSA, they develop an off-policy Actor-Critic algorithm that integrates these components within a unified framework. This study presents the first method capable of estimating policy gradients for dynamic risk without requiring transition perturbations, thereby introducing elicitable risk measures into model-free reinforcement learning. Empirical results demonstrate that the proposed approach effectively learns risk-sensitive policies that avoid hazardous outcomes, significantly outperforming existing baselines in relevant tasks.
📝 Abstract
Optimizing dynamic risk with stochastic policies is challenging in both policy updates and value learning. The former typically requires transition perturbation, while the latter may rely on model-based approaches. To address these challenges, we propose a surrogate policy gradient without transition perturbation under softmax policy parameterization. We further develop model-free value learning methods for dynamic expectile and conditional value-at-risk by leveraging elicitability. Finally, inspired by Expected SARSA and Expected Policy Gradient, a model-free off-policy actor-critic algorithm is constructed. Empirical results in domains with verifiable risk-averse behavior show that our algorithm can learn risk-averse policy and consistently outperforms other existing methods.
Problem

Research questions and friction points this paper is trying to address.

dynamic risk
stochastic policies
policy gradient
value learning
risk-averse
Innovation

Methods, ideas, or system contributions that make the work stand out.

Actor-Critic
dynamic risk
elicitable statistics
model-free RL
risk-sensitive optimization
🔎 Similar Papers
No similar papers found.