Geometry-Inspired Unified Framework for Discounted and Average Reward MDPs

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This paper unifies the theoretical analysis of discounted and average-reward Markov decision processes (MDPs). Addressing the long-standing theoretical separation between these two MDP classes, it extends the geometric perspective of discounted MDPs to the average-reward setting, establishing a unified modeling framework grounded in the geometric structure of the state space. By integrating dynamic programming, ergodicity theory, and value iteration, the authors prove that, under the assumption of a unique ergodic optimal policy, value iteration exhibits geometric convergence in both discounted and average-reward MDPs. This work achieves a deep theoretical unification of the two paradigms and, for the first time, establishes geometric convergence of value iteration in average-reward MDPs—providing a novel analytical paradigm for reinforcement learning and optimal control.

Technology Category

Application Category

📝 Abstract

The theoretical analysis of Markov Decision Processes (MDPs) is commonly split into two cases - the average-reward case and the discounted-reward case - which, while sharing similarities, are typically analyzed separately. In this work, we extend a recently introduced geometric interpretation of MDPs for the discounted-reward case to the average-reward case, thereby unifying both. This allows us to extend a major result known for the discounted-reward case to the average-reward case: under a unique and ergodic optimal policy, the Value Iteration algorithm achieves a geometric convergence rate.

Problem

Research questions and friction points this paper is trying to address.

Unifying discounted and average reward MDP analysis

Extending geometric interpretation to average-reward MDPs

Establishing geometric convergence for Value Iteration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified geometric framework for MDPs

Extends discounted results to average rewards

Proves geometric convergence for Value Iteration

🔎 Similar Papers

No similar papers found.