🤖 AI Summary
In robust Markov decision processes (RMDPs), multiple optimal robust policies often exhibit substantial performance disparities under non-adversarial transitions. To address this, we propose the “Optimal Robust Best-Effort” (ORBE) policy: it guarantees optimal worst-case expected return while simultaneously maximizing expected return across all non-worst-case transition probabilities. Drawing inspiration from game-theoretic notions of dominance and best-effort behavior, ORBE transcends the traditional sole focus on worst-case optimization. Leveraging an s-rectangular uncertainty set, we extend robust value iteration and devise a low-complexity ORBE algorithm. We rigorously establish the existence and structural properties of ORBE policies. Numerical experiments demonstrate that ORBE preserves robustness while significantly improving average-case performance, offering a more refined and practical criterion for selecting robust policies.
📝 Abstract
We study the common generalization of Markov decision processes (MDPs) with sets of transition probabilities, known as robust MDPs (RMDPs). A standard goal in RMDPs is to compute a policy that maximizes the expected return under an adversarial choice of the transition probabilities. If the uncertainty in the probabilities is independent between the states, known as s-rectangularity, such optimal robust policies can be computed efficiently using robust value iteration. However, there might still be multiple optimal robust policies, which, while equivalent with respect to the worst-case, reflect different expected returns under non-adversarial choices of the transition probabilities. Hence, we propose a refined policy selection criterion for RMDPs, drawing inspiration from the notions of dominance and best-effort in game theory. Instead of seeking a policy that only maximizes the worst-case expected return, we additionally require the policy to achieve a maximal expected return under different (i.e., not fully adversarial) transition probabilities. We call such a policy an optimal robust best-effort (ORBE) policy. We prove that ORBE policies always exist, characterize their structure, and present an algorithm to compute them with a small overhead compared to standard robust value iteration. ORBE policies offer a principled tie-breaker among optimal robust policies. Numerical experiments show the feasibility of our approach.