Best-Effort Policies for Robust Markov Decision Processes

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In robust Markov decision processes (RMDPs), multiple optimal robust policies often exhibit substantial performance disparities under non-adversarial transitions. To address this, we propose the “Optimal Robust Best-Effort” (ORBE) policy: it guarantees optimal worst-case expected return while simultaneously maximizing expected return across all non-worst-case transition probabilities. Drawing inspiration from game-theoretic notions of dominance and best-effort behavior, ORBE transcends the traditional sole focus on worst-case optimization. Leveraging an s-rectangular uncertainty set, we extend robust value iteration and devise a low-complexity ORBE algorithm. We rigorously establish the existence and structural properties of ORBE policies. Numerical experiments demonstrate that ORBE preserves robustness while significantly improving average-case performance, offering a more refined and practical criterion for selecting robust policies.

Technology Category

Application Category

📝 Abstract
We study the common generalization of Markov decision processes (MDPs) with sets of transition probabilities, known as robust MDPs (RMDPs). A standard goal in RMDPs is to compute a policy that maximizes the expected return under an adversarial choice of the transition probabilities. If the uncertainty in the probabilities is independent between the states, known as s-rectangularity, such optimal robust policies can be computed efficiently using robust value iteration. However, there might still be multiple optimal robust policies, which, while equivalent with respect to the worst-case, reflect different expected returns under non-adversarial choices of the transition probabilities. Hence, we propose a refined policy selection criterion for RMDPs, drawing inspiration from the notions of dominance and best-effort in game theory. Instead of seeking a policy that only maximizes the worst-case expected return, we additionally require the policy to achieve a maximal expected return under different (i.e., not fully adversarial) transition probabilities. We call such a policy an optimal robust best-effort (ORBE) policy. We prove that ORBE policies always exist, characterize their structure, and present an algorithm to compute them with a small overhead compared to standard robust value iteration. ORBE policies offer a principled tie-breaker among optimal robust policies. Numerical experiments show the feasibility of our approach.
Problem

Research questions and friction points this paper is trying to address.

Develops best-effort policies for robust Markov decision processes
Addresses multiple optimal policies under non-adversarial transition probabilities
Provides tie-breaker criterion beyond worst-case optimization in RMDPs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal robust best-effort policy selection
Dominance and best-effort game theory inspiration
Efficient algorithm with small overhead
🔎 Similar Papers
No similar papers found.