🤖 AI Summary
This work proposes an efficient natural policy optimization method based on a rank-1 approximation of the inverse Fisher information matrix (FIM), addressing the high computational cost associated with frequent exact FIM inversion in natural policy gradient algorithms. The approach significantly reduces computational complexity while preserving convergence guarantees. Notably, the theoretical analysis establishes, for the first time, that under certain conditions this approximation can achieve faster convergence than standard policy gradient methods, with sample complexity comparable to that of stochastic policy gradients. Empirical evaluations demonstrate that the proposed method consistently outperforms standard Actor-Critic and trust-region baselines across a range of reinforcement learning environments.
📝 Abstract
Natural gradients have long been studied in deep reinforcement learning due to their fast convergence properties and covariant weight updates. However, computing natural gradients requires inversion of the Fisher Information Matrix (FIM) at each iteration, which is computationally prohibitive in nature. In this paper, we present an efficient and scalable natural policy optimization technique that leverages a rank-1 approximation to full inverse-FIM. We theoretically show that under certain conditions, a rank-1 approximation to inverse-FIM converges faster than policy gradients and, under some conditions, enjoys the same sample complexity as stochastic policy gradient methods. We benchmark our method on a diverse set of environments and show that it achieves superior performance to standard actor-critic and trust-region baselines.