On the Complexity of Discounted Robust MDPs with $L_p$ Uncertainty Sets

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work investigates the existence of a strongly polynomial-time algorithm for robust Markov decision processes (RMDPs) under the discounted cost criterion with $(s,a)$-rectangular $L_p$ uncertainty sets. By integrating policy iteration with an oracle for solving robust Markov chains (RMCs), the study establishes—for the first time—that policy iteration is strongly polynomial for RMDPs with any compact uncertainty set. In particular, strongly polynomial time bounds are derived for RMCs under $L_1$ and $L_\infty$ uncertainty, and empirical experiments confirm their computational efficiency. Moreover, the paper proves that the problem becomes NP-hard when $1 < p < \infty$ and $p$ is an integer, thereby revealing the intrinsic computational intractability associated with intermediate values of $p$.
📝 Abstract
A basic model in sequential decision making is the Markov decision process (MDP), which is extended to Robust MDPs (RMDPs) by allowing uncertainty in transition probabilities and optimizing against the worst-case transition probabilities from the uncertainty sets. The class of $(s, a)$-rectangular RMDPs with $L_p$ uncertainty sets provides a flexible and expressive model for such problems. We study this class of RMDPs with a discounted-sum cost criterion and a constant discount factor. The existence of an efficient algorithm for this class is a fundamental theoretical question in optimization and sequential decision making. Previous results only establish a strongly polynomial-time algorithm for $L_\infty$ uncertainty sets. In this work, our main results are as follows: (a)~we show that for any compact uncertainty set, the policy iteration algorithm for RMDPs is strongly polynomial with oracle access to solutions of Robust Markov chains (RMCs); (b)~we present strongly polynomial-time bounds on the policy iteration algorithm for RMCs with $L_1$ and $L_\infty$ uncertainty sets; and (c)~we establish hardness results for RMCs with $L_p$ uncertainty sets for integer $p$ satisfying $1<p<\infty$. Finally, motivated by our theoretical bounds, we present experimental results showing how fast policy iteration converges for RMDPs with $L_1$ and $L_\infty$ uncertainty sets.
Problem

Research questions and friction points this paper is trying to address.

Robust MDPs
L_p uncertainty sets
discounted-sum cost
computational complexity
policy iteration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Robust MDPs
L_p uncertainty sets
policy iteration
strongly polynomial algorithm
computational hardness