Value Iteration with Guessing for Markov Chains and Markov Decision Processes

📅 2025-05-10

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

For reachability and stochastic shortest path problems on Markov chains (MCs) and Markov decision processes (MDPs), existing value iteration (VI) algorithms require exponentially many Bellman updates in the worst case; whether polynomial preprocessing can enable subexponential convergence has remained open for decades. Method: We propose a novel VI framework based on value-range guessing, incorporating graph-theoretic preprocessing, adaptive value-range estimation, and update pruning. Contribution/Results: For MCs, we achieve near-linear preprocessing and subexponential Bellman updates—the first such result. For MDPs, we refine convergence analysis to establish tighter iteration bounds. We design the first practical, guess-driven VI algorithm. Empirically, on standard benchmarks, our approach significantly reduces both iteration count and runtime, providing the first experimental validation of subexponential-complexity VI.

Technology Category

Application Category

📝 Abstract

Two standard models for probabilistic systems are Markov chains (MCs) and Markov decision processes (MDPs). Classic objectives for such probabilistic models for control and planning problems are reachability and stochastic shortest path. The widely studied algorithmic approach for these problems is the Value Iteration (VI) algorithm which iteratively applies local updates called Bellman updates. There are many practical approaches for VI in the literature but they all require exponentially many Bellman updates for MCs in the worst case. A preprocessing step is an algorithm that is discrete, graph-theoretical, and requires linear space. An important open question is whether, after a polynomial-time preprocessing, VI can be achieved with sub-exponentially many Bellman updates. In this work, we present a new approach for VI based on guessing values. Our theoretical contributions are twofold. First, for MCs, we present an almost-linear-time preprocessing algorithm after which, along with guessing values, VI requires only subexponentially many Bellman updates. Second, we present an improved analysis of the speed of convergence of VI for MDPs. Finally, we present a practical algorithm for MDPs based on our new approach. Experimental results show that our approach provides a considerable improvement over existing VI-based approaches on several benchmark examples from the literature.

Problem

Research questions and friction points this paper is trying to address.

Improving Value Iteration efficiency for Markov Chains

Reducing Bellman updates via guessing values preprocessing

Enhancing convergence speed of VI for MDPs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Guessing values for Value Iteration optimization

Almost-linear-time preprocessing for Markov Chains

Improved convergence analysis for MDPs

🔎 Similar Papers

Order-Optimal Regret with Novel Policy Gradient Approaches in Infinite-Horizon Average Reward MDPs