Beyond discounted returns: Robust Markov decision processes with average and Blackwell optimality

📅 2023-12-06
🏛️ arXiv.org
📈 Citations: 4
Influential: 2
📄 PDF
🤖 AI Summary
This paper investigates the existence and construction of optimal policies for robust Markov decision processes (RMDPs) under the average-reward criterion, focusing on two uncertainty structures: *sa*-rectangularity and *s*-rectangularity. Theoretically, it establishes—for the first time—that average-optimal policies may fail to exist or require history dependence in *s*-rectangular RMDPs; in contrast, it proves that ε-Blackwell optimal policies always exist under *sa*-rectangularity and provides a practical sufficient condition for the existence of Blackwell optimal policies. Methodologically, the work integrates robust optimization, stochastic games, dynamic programming, and fixed-point analysis, and develops computationally tractable algorithms based on policy iteration and linear programming to compute average-optimal rewards. Key contributions include: (i) establishing the theoretical superiority of *sa*-rectangularity for long-run robust decision-making; (ii) rigorously linking average optimality and Blackwell optimality; and (iii) proposing the first computationally feasible algorithmic framework for average-optimal policy synthesis in RMDPs.
📝 Abstract
Robust Markov Decision Processes (RMDPs) are a widely used framework for sequential decision-making under parameter uncertainty. RMDPs have been extensively studied when the objective is to maximize the discounted return, but little is known for average optimality (optimizing the long-run average of the rewards obtained over time) and Blackwell optimality (remaining discount optimal for all discount factors sufficiently close to ). In this paper, we prove several foundational results for RMDPs beyond the discounted return. We show that average optimal policies can be chosen stationary and deterministic for sa-rectangular RMDPs but, perhaps surprisingly, we show that for s-rectangular RMDPs average optimal policies may not exist, and if they exist, may need to be history-dependent (Markovian). We also study Blackwell optimality for sa-rectangular RMDPs, where we show that $epsilon$-Blackwell optimal policies always exist, although Blackwell optimal policies may not exist. We also provide a sufficient condition for their existence, which encompasses virtually any examples from the literature. We then discuss the connection between average and Blackwell optimality, and we describe several algorithms to compute the optimal average return. Interestingly, our approach leverages the connections between RMDPs and stochastic games. Overall, our paper emphasizes the superior practical properties of distance-based sa-rectangular models over s-rectangular models for average and Blackwell optimality.
Problem

Research questions and friction points this paper is trying to address.

Uncertainty
Markov Decision Processes
Blackwell Optimality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Robust Markov Decision Processes
Optimality in sa-rectangular and s-rectangular RMDPs
Near-optimal Strategies
🔎 Similar Papers
No similar papers found.
J
Julien Grand-Clement
Information Systems and Operations Management Department, HEC Paris
Marek Petrik
Marek Petrik
University of New Hampshire
Machine Learning
N
Nicolas Vieille
Economics and Decision Sciences Department, HEC Paris