Robust Reinforcement Learning with Dynamic Distortion Risk Measures

📅 2024-09-16
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
Reinforcement learning (RL) faces dual challenges of environmental uncertainty and risk-sensitive decision-making. Method: This paper proposes a robust, risk-aware RL framework that—novelty—integrates dynamic distortion risk measures into Wasserstein distributionally robust optimization, jointly modeling model uncertainty and risk sensitivity. Theoretically, we derive a quantile-based policy gradient theorem and introduce a neural risk estimation method grounded in strictly consistent scoring functions, implemented within an Actor-Critic architecture. Contribution/Results: Experiments on portfolio allocation demonstrate that the framework significantly enhances policy robustness under distributional shift, improves risk-adjusted returns, and ensures time-consistent decisions. It establishes a new paradigm for risk-sensitive, cross-environment policy learning.

Technology Category

Application Category

📝 Abstract
In a reinforcement learning (RL) setting, the agent's optimal strategy heavily depends on her risk preferences and the underlying model dynamics of the training environment. These two aspects influence the agent's ability to make well-informed and time-consistent decisions when facing testing environments. In this work, we devise a framework to solve robust risk-aware RL problems where we simultaneously account for environmental uncertainty and risk with a class of dynamic robust distortion risk measures. Robustness is introduced by considering all models within a Wasserstein ball around a reference model. We estimate such dynamic robust risk measures using neural networks by making use of strictly consistent scoring functions, derive policy gradient formulae using the quantile representation of distortion risk measures, and construct an actor-critic algorithm to solve this class of robust risk-aware RL problems. We demonstrate the performance of our algorithm on a portfolio allocation example.
Problem

Research questions and friction points this paper is trying to address.

Develops robust risk-aware RL with dynamic distortion measures
Addresses environmental uncertainty and risk simultaneously
Uses Wasserstein ball for model robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic robust distortion risk measures framework
Neural networks estimate risk with scoring functions
Actor-critic algorithm for robust risk-aware RL
🔎 Similar Papers
No similar papers found.