MDP Geometry, Normalization and Reward Balancing Solvers

📅 2024-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses policy optimization for Markov decision processes (MDPs) with unknown transition probabilities. We introduce a geometric normalization perspective: we define and construct a family of value-function transformations that preserve action advantages under any policy, thereby establishing a reward-balancing framework for optimal policy computation. Iterative application of these transformations yields a class of sampling-based reward-balancing algorithms. Theoretically, our approach breaks the existing sample-complexity lower bound without requiring prior knowledge of the model. Empirically, it significantly improves convergence speed and policy robustness while enabling direct policy extraction—without auxiliary re-scaling. Our core contribution is the first characterization of MDPs as advantage-invariant geometric structures, which simultaneously enhances both sample efficiency and theoretical guarantees.

Technology Category

Application Category

📝 Abstract
We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation of the MDP motivates a class of algorithms which we call Reward Balancing, which solve MDPs by iterating through these transformations, until an approximately optimal policy can be trivially found. We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results.
Problem

Research questions and friction points this paper is trying to address.

Geometric interpretation of MDPs with normalization
Advantage-preserving transformation for value function adjustment
Reward Balancing algorithms for solving MDPs efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometric interpretation of MDPs with normalization
Reward Balancing algorithms for MDP solutions
Improved sample complexity for unknown transitions
🔎 Similar Papers
No similar papers found.
Arsenii Mustafin
Arsenii Mustafin
PhD student, Boston University
Reinforcement LearningExplainable AI
A
Aleksei Pakharev
Computational Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
A
Alexander Olshevsky
Department of ECE, Boston University, Boston, MA 02215, USA
I
I. Paschalidis
Department of ECE, Boston University, Boston, MA 02215, USA