Best-Effort Policies for Robust Markov Decision Processes

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

In robust Markov decision processes (RMDPs), multiple optimal robust policies often exhibit substantial performance disparities under non-adversarial transitions. To address this, we propose the “Optimal Robust Best-Effort” (ORBE) policy: it guarantees optimal worst-case expected return while simultaneously maximizing expected return across all non-worst-case transition probabilities. Drawing inspiration from game-theoretic notions of dominance and best-effort behavior, ORBE transcends the traditional sole focus on worst-case optimization. Leveraging an s-rectangular uncertainty set, we extend robust value iteration and devise a low-complexity ORBE algorithm. We rigorously establish the existence and structural properties of ORBE policies. Numerical experiments demonstrate that ORBE preserves robustness while significantly improving average-case performance, offering a more refined and practical criterion for selecting robust policies.

Technology Category

Application Category

📝 Abstract

We study the common generalization of Markov decision processes (MDPs) with sets of transition probabilities, known as robust MDPs (RMDPs). A standard goal in RMDPs is to compute a policy that maximizes the expected return under an adversarial choice of the transition probabilities. If the uncertainty in the probabilities is independent between the states, known as s-rectangularity, such optimal robust policies can be computed efficiently using robust value iteration. However, there might still be multiple optimal robust policies, which, while equivalent with respect to the worst-case, reflect different expected returns under non-adversarial choices of the transition probabilities. Hence, we propose a refined policy selection criterion for RMDPs, drawing inspiration from the notions of dominance and best-effort in game theory. Instead of seeking a policy that only maximizes the worst-case expected return, we additionally require the policy to achieve a maximal expected return under different (i.e., not fully adversarial) transition probabilities. We call such a policy an optimal robust best-effort (ORBE) policy. We prove that ORBE policies always exist, characterize their structure, and present an algorithm to compute them with a small overhead compared to standard robust value iteration. ORBE policies offer a principled tie-breaker among optimal robust policies. Numerical experiments show the feasibility of our approach.

Problem

Research questions and friction points this paper is trying to address.

Develops best-effort policies for robust Markov decision processes

Addresses multiple optimal policies under non-adversarial transition probabilities

Provides tie-breaker criterion beyond worst-case optimization in RMDPs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal robust best-effort policy selection

Dominance and best-effort game theory inspiration

Efficient algorithm with small overhead

🔎 Similar Papers

Order-Optimal Regret with Novel Policy Gradient Approaches in Infinite-Horizon Average Reward MDPs