Geometry-Inspired Unified Framework for Discounted and Average Reward MDPs

๐Ÿ“… 2025-10-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper unifies the theoretical analysis of discounted and average-reward Markov decision processes (MDPs). Addressing the long-standing theoretical separation between these two MDP classes, it extends the geometric perspective of discounted MDPs to the average-reward setting, establishing a unified modeling framework grounded in the geometric structure of the state space. By integrating dynamic programming, ergodicity theory, and value iteration, the authors prove that, under the assumption of a unique ergodic optimal policy, value iteration exhibits geometric convergence in both discounted and average-reward MDPs. This work achieves a deep theoretical unification of the two paradigms and, for the first time, establishes geometric convergence of value iteration in average-reward MDPsโ€”providing a novel analytical paradigm for reinforcement learning and optimal control.

Technology Category

Application Category

๐Ÿ“ Abstract
The theoretical analysis of Markov Decision Processes (MDPs) is commonly split into two cases - the average-reward case and the discounted-reward case - which, while sharing similarities, are typically analyzed separately. In this work, we extend a recently introduced geometric interpretation of MDPs for the discounted-reward case to the average-reward case, thereby unifying both. This allows us to extend a major result known for the discounted-reward case to the average-reward case: under a unique and ergodic optimal policy, the Value Iteration algorithm achieves a geometric convergence rate.
Problem

Research questions and friction points this paper is trying to address.

Unifying discounted and average reward MDP analysis
Extending geometric interpretation to average-reward MDPs
Establishing geometric convergence for Value Iteration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified geometric framework for MDPs
Extends discounted results to average rewards
Proves geometric convergence for Value Iteration
๐Ÿ”Ž Similar Papers
No similar papers found.