Actor-Critic Algorithm for Dynamic Expectile and CVaR

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the challenges of optimizing dynamic risk measures—such as expectiles and Conditional Value-at-Risk (CVaR)—under stochastic policies, where conventional policy gradient methods require perturbations to state transitions and value estimation typically relies on environment models. To overcome these limitations, the authors propose a novel policy gradient approach that eliminates the need for transition perturbations and leverages elicitable risk statistics to enable model-free value learning for dynamic risk. Building upon Expected SARSA, they develop an off-policy Actor-Critic algorithm that integrates these components within a unified framework. This study presents the first method capable of estimating policy gradients for dynamic risk without requiring transition perturbations, thereby introducing elicitable risk measures into model-free reinforcement learning. Empirical results demonstrate that the proposed approach effectively learns risk-sensitive policies that avoid hazardous outcomes, significantly outperforming existing baselines in relevant tasks.

📝 Abstract

Optimizing dynamic risk with stochastic policies is challenging in both policy updates and value learning. The former typically requires transition perturbation, while the latter may rely on model-based approaches. To address these challenges, we propose a surrogate policy gradient without transition perturbation under softmax policy parameterization. We further develop model-free value learning methods for dynamic expectile and conditional value-at-risk by leveraging elicitability. Finally, inspired by Expected SARSA and Expected Policy Gradient, a model-free off-policy actor-critic algorithm is constructed. Empirical results in domains with verifiable risk-averse behavior show that our algorithm can learn risk-averse policy and consistently outperforms other existing methods.

Problem

Research questions and friction points this paper is trying to address.

dynamic risk

stochastic policies

policy gradient

value learning

risk-averse

Innovation

Methods, ideas, or system contributions that make the work stand out.

Actor-Critic

dynamic risk

elicitable statistics