MDP Geometry, Normalization and Reward Balancing Solvers

📅 2024-07-09

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses policy optimization for Markov decision processes (MDPs) with unknown transition probabilities. We introduce a geometric normalization perspective: we define and construct a family of value-function transformations that preserve action advantages under any policy, thereby establishing a reward-balancing framework for optimal policy computation. Iterative application of these transformations yields a class of sampling-based reward-balancing algorithms. Theoretically, our approach breaks the existing sample-complexity lower bound without requiring prior knowledge of the model. Empirically, it significantly improves convergence speed and policy robustness while enabling direct policy extraction—without auxiliary re-scaling. Our core contribution is the first characterization of MDPs as advantage-invariant geometric structures, which simultaneously enhances both sample efficiency and theoretical guarantees.

Technology Category

Application Category

📝 Abstract

We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation of the MDP motivates a class of algorithms which we call Reward Balancing, which solve MDPs by iterating through these transformations, until an approximately optimal policy can be trivially found. We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results.

Problem

Research questions and friction points this paper is trying to address.

Geometric interpretation of MDPs with normalization

Advantage-preserving transformation for value function adjustment

Reward Balancing algorithms for solving MDPs efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometric interpretation of MDPs with normalization

Reward Balancing algorithms for MDP solutions

Improved sample complexity for unknown transitions

🔎 Similar Papers

No similar papers found.