Robust Reinforcement Learning with Dynamic Distortion Risk Measures

📅 2024-09-16

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Reinforcement learning (RL) faces dual challenges of environmental uncertainty and risk-sensitive decision-making. Method: This paper proposes a robust, risk-aware RL framework that—novelty—integrates dynamic distortion risk measures into Wasserstein distributionally robust optimization, jointly modeling model uncertainty and risk sensitivity. Theoretically, we derive a quantile-based policy gradient theorem and introduce a neural risk estimation method grounded in strictly consistent scoring functions, implemented within an Actor-Critic architecture. Contribution/Results: Experiments on portfolio allocation demonstrate that the framework significantly enhances policy robustness under distributional shift, improves risk-adjusted returns, and ensures time-consistent decisions. It establishes a new paradigm for risk-sensitive, cross-environment policy learning.

Technology Category

Application Category

📝 Abstract

In a reinforcement learning (RL) setting, the agent's optimal strategy heavily depends on her risk preferences and the underlying model dynamics of the training environment. These two aspects influence the agent's ability to make well-informed and time-consistent decisions when facing testing environments. In this work, we devise a framework to solve robust risk-aware RL problems where we simultaneously account for environmental uncertainty and risk with a class of dynamic robust distortion risk measures. Robustness is introduced by considering all models within a Wasserstein ball around a reference model. We estimate such dynamic robust risk measures using neural networks by making use of strictly consistent scoring functions, derive policy gradient formulae using the quantile representation of distortion risk measures, and construct an actor-critic algorithm to solve this class of robust risk-aware RL problems. We demonstrate the performance of our algorithm on a portfolio allocation example.

Problem

Research questions and friction points this paper is trying to address.

Develops robust risk-aware RL with dynamic distortion measures

Addresses environmental uncertainty and risk simultaneously

Uses Wasserstein ball for model robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic robust distortion risk measures framework

Neural networks estimate risk with scoring functions

Actor-critic algorithm for robust risk-aware RL

🔎 Similar Papers

No similar papers found.