Fisher-Rao Gradient Flows of Linear Programs and State-Action Natural Policy Gradients

📅 2024-03-28
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates natural policy gradient optimization based on the state-action distribution Fisher information matrix. We formulate policy optimization as a Fisher–Rao gradient flow over the state-action polytope—a linear program—and establish, for the first time, a geometry-dependent linear convergence theory. We refine the entropy-regularized error bound to yield a tighter estimate. Furthermore, we extend the analysis to perturbed Fisher–Rao flows and approximate natural gradient flows, proving their sublinear convergence and deriving explicit error upper bounds for state-action natural policy gradients. Our key contributions are: (1) the first systematic proof of linear convergence of the Fisher–Rao gradient flow for linear programming; (2) a unified characterization of bias and convergence-rate degradation induced by entropy regularization, perturbations, and approximation; and (3) a geometrically grounded convergence analysis framework for natural policy gradients.

Technology Category

Application Category

📝 Abstract
Kakade's natural policy gradient method has been studied extensively in recent years, showing linear convergence with and without regularization. We study another natural gradient method based on the Fisher information matrix of the state-action distributions which has received little attention from the theoretical side. Here, the state-action distributions follow the Fisher-Rao gradient flow inside the state-action polytope with respect to a linear potential. Therefore, we study Fisher-Rao gradient flows of linear programs more generally and show linear convergence with a rate that depends on the geometry of the linear program. Equivalently, this yields an estimate on the error induced by entropic regularization of the linear program which improves existing results. We extend these results and show sublinear convergence for perturbed Fisher-Rao gradient flows and natural gradient flows up to an approximation error. In particular, these general results cover the case of state-action natural policy gradients.
Problem

Research questions and friction points this paper is trying to address.

Analyzes Fisher-Rao gradient flows in linear programs.
Explores state-action natural policy gradients theoretically.
Improves convergence rates with entropic regularization.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fisher-Rao gradient flow
Linear program convergence
State-action natural policy
🔎 Similar Papers
No similar papers found.
J
Johannes Müller
Department of Mathematics, RWTH Aachen University, Aachen, 52062, Germany
Semih Cayci
Semih Cayci
Assistant Professor, RWTH Aachen University
Reinforcement learningdeep learning theoryoptimization
G
Guido Montúfar
Departments of Mathematics and Statistics & Data Science, University of California, Los Angeles, 90095, USA; Max Planck Institute for Mathematics in the Sciences, Leipzig, 04103, Germany