Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This study resolves a long-standing open question regarding the existence of a strongly polynomial-time algorithm for $(s, a)$-rectangular $L_\infty$ robust Markov decision processes (RMDPs) under a fixed discount factor. We propose a robust policy iteration algorithm and establish, for the first time, that it runs in strongly polynomial time when the discount factor is fixed. This work successfully extends Ye’s seminal result on the strong polynomiality of policy iteration in classical MDPs to the robust optimization setting, thereby confirming the existence of a strongly polynomial-time algorithm for this class of RMDPs and settling a fundamental algorithmic challenge in the field.

Technology Category

Application Category

📝 Abstract

Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$-rectangular RMDPs with $L_\infty$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly--polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem. In this work, we show that a robust policy iteration algorithm runs in strongly-polynomial time for $(s, a)$-rectangular $L_\infty$ RMDPs with a constant (fixed) discount factor, resolving an important algorithmic question.

Problem

Research questions and friction points this paper is trying to address.

Robust MDPs

Strongly Polynomial Time

Policy Iteration

L∞ Uncertainty

Discounted Payoffs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Robust MDPs

Strongly Polynomial Time

Policy Iteration