๐ค AI Summary
This paper unifies the theoretical analysis of discounted and average-reward Markov decision processes (MDPs). Addressing the long-standing theoretical separation between these two MDP classes, it extends the geometric perspective of discounted MDPs to the average-reward setting, establishing a unified modeling framework grounded in the geometric structure of the state space. By integrating dynamic programming, ergodicity theory, and value iteration, the authors prove that, under the assumption of a unique ergodic optimal policy, value iteration exhibits geometric convergence in both discounted and average-reward MDPs. This work achieves a deep theoretical unification of the two paradigms and, for the first time, establishes geometric convergence of value iteration in average-reward MDPsโproviding a novel analytical paradigm for reinforcement learning and optimal control.
๐ Abstract
The theoretical analysis of Markov Decision Processes (MDPs) is commonly split into two cases - the average-reward case and the discounted-reward case - which, while sharing similarities, are typically analyzed separately. In this work, we extend a recently introduced geometric interpretation of MDPs for the discounted-reward case to the average-reward case, thereby unifying both. This allows us to extend a major result known for the discounted-reward case to the average-reward case: under a unique and ergodic optimal policy, the Value Iteration algorithm achieves a geometric convergence rate.