Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Natural policy gradient methods are computationally expensive and difficult to scale due to the explicit estimation and inversion of the Fisher information matrix. This work proposes an efficient alternative that reformulates the natural gradient as a standard policy gradient with a transformed advantage function, leveraging Tikhonov regularization and the Woodbury matrix identity. The required transformation is implemented via stochastic block Kaczmarz iterations on on-policy minibatches, eliminating the need to explicitly construct the Fisher matrix, employ conjugate gradient solvers, or rely on architecture-specific approximations. Notably, the method supports end-to-end automatic differentiation for the first time, offering both generality and implementation simplicity. Empirical results demonstrate that it matches or surpasses existing natural gradient approaches on continuous and visual control benchmarks while seamlessly integrating into diverse neural network architectures.

📝 Abstract

Natural policy gradients improve optimization by accounting for the geometry of distribution space, but their practical use is limited by the cost of estimating and inverting the Fisher matrix. We present Randomized Advantage Transformation (RAT), a method for estimating Tikhonov-regularized natural policy gradients via direct backpropagation. By applying the Woodbury formula, we reformulate the regularized natural policy gradients as vanilla policy gradients with a transformed advantage. RAT computes this transformation efficiently via randomized block Kaczmarz iterations on on-policy mini-batches, avoiding explicit Fisher construction, conjugate-gradient solvers, and architecture-specific approximations. We provide convergence guarantees for RAT and demonstrate empirically that it matches or exceeds established natural-gradient methods across continuous and visual control benchmarks, while remaining simple to implement and compatible with various architectures.

Problem

Research questions and friction points this paper is trying to address.

natural policy gradients

Fisher matrix

computational cost

reinforcement learning

policy optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Natural Policy Gradient

Randomized Advantage Transformation

Woodbury Formula