Reinforcement Learning with Markov Risk Measures and Multipattern Risk Approximation

📅 2026-05-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

184K/year
🤖 AI Summary
This work addresses the challenges of risk modeling and efficient learning in risk-averse finite-horizon Markov decision processes by proposing a novel framework that integrates mini-batch Markov coherent risk measures with a multimodal risk-aversion approach. By introducing feature-based Q-learning and multimodal Q-factor approximation, the method overcomes the limitations of conventional linear systems and effectively captures complex risk structures. The designed economical Q-learning algorithm streamlines policy evaluation and achieves a regret bound of 𝒪(H²Nᴴ√K) with high probability. Empirical evaluations on stochastic allocation and short-horizon multi-armed bandit tasks demonstrate the method’s effectiveness and superiority over existing approaches.
📝 Abstract
For a risk-averse finite-horizon Markov Decision Problem, we introduce a special class of Markov coherent risk measures, called mini-batch measures. We also define the class of multipattern risk-averse problems that generalizes the class of linear systems. We use both concepts in a feature-based $Q$-learning method with multipattern $Q$-factor approximation and we prove a high-probability regret bound of $\mathcal{O}\big(H^2 N^H \sqrt{ K}\big)$, where $H$ is the horizon, $N$ is the mini-batch size, and $K$ is the number of episodes. We also propose an economical version of the $Q$-learning method that streamlines the policy evaluation (backward) step. The theoretical results are illustrated on a stochastic assignment problem and a short-horizon multi-armed bandit problem.
Problem

Research questions and friction points this paper is trying to address.

Markov Decision Process
Risk-Averse Reinforcement Learning
Coherent Risk Measures
Multipattern Risk Approximation
Finite-Horizon
Innovation

Methods, ideas, or system contributions that make the work stand out.

Markov risk measures
mini-batch measures
multipattern risk approximation
risk-averse reinforcement learning
Q-learning
🔎 Similar Papers
No similar papers found.