Rank-One Modified Value Iteration

📅 2025-05-03

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This paper addresses planning and learning in Markov decision processes (MDPs). Methodologically, it proposes an enhanced value iteration algorithm that jointly integrates rank-one transition matrix approximation and stationary distribution estimation: during policy evaluation, the power method is coupled to approximate the stationary distribution, and a policy-iteration-style update mechanism is introduced—marking the first incorporation of rank-one matrix approximation and stationary distribution modeling into the value iteration framework. Theoretically, the algorithm retains the same asymptotic convergence rate and computational complexity as standard value iteration and Q-learning—namely, O(|S||A|/ε)—in both planning and learning settings. Empirically, it significantly outperforms various first-order and accelerated methods, achieving a superior trade-off between convergence speed and solution accuracy.

Technology Category

Application Category

📝 Abstract

In this paper, we provide a novel algorithm for solving planning and learning problems of Markov decision processes. The proposed algorithm follows a policy iteration-type update by using a rank-one approximation of the transition probability matrix in the policy evaluation step. This rank-one approximation is closely related to the stationary distribution of the corresponding transition probability matrix, which is approximated using the power method. We provide theoretical guarantees for the convergence of the proposed algorithm to optimal (action-)value function with the same rate and computational complexity as the value iteration algorithm in the planning problem and as the Q-learning algorithm in the learning problem. Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.

Problem

Research questions and friction points this paper is trying to address.

Solves Markov decision process planning efficiently

Uses rank-one approximation for policy evaluation

Outperforms first-order algorithms in simulations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses rank-one approximation in policy evaluation

Approximates stationary distribution via power method

Converges faster than first-order algorithms

🔎 Similar Papers

A fast Multiplicative Updates algorithm for Non-negative Matrix Factorization