Policy Newton Algorithm in Reproducing Kernel Hilbert Space

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Nonparametric reinforcement learning (RL) policy optimization in reproducing kernel Hilbert spaces (RKHS) is constrained by first-order methods, as direct application of second-order optimization is hindered by the infinite-dimensional nature of the Hessian operator. Method: This paper introduces the first second-order optimization framework tailored for RKHS policy representations, circumventing explicit computation and inversion of the infinite-dimensional Hessian. We propose the first RKHS-specific second-order algorithm, integrating cubic regularization into the auxiliary objective and leveraging the representer theorem to achieve effective dimensionality reduction. Contribution/Results: We establish theoretical guarantees of local quadratic convergence. Empirical evaluation on financial portfolio allocation confirms convergence behavior, while benchmark RL tasks demonstrate substantial improvements over existing RKHS-based first-order methods and parametric second-order approaches—achieving faster convergence and higher cumulative rewards.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) policies represented in Reproducing Kernel Hilbert Spaces (RKHS) offer powerful representational capabilities. While second-order optimization methods like Newton's method demonstrate faster convergence than first-order approaches, current RKHS-based policy optimization remains constrained to first-order techniques. This limitation stems primarily from the intractability of explicitly computing and inverting the infinite-dimensional Hessian operator in RKHS. We introduce Policy Newton in RKHS, the first second-order optimization framework specifically designed for RL policies represented in RKHS. Our approach circumvents direct computation of the inverse Hessian operator by optimizing a cubic regularized auxiliary objective function. Crucially, we leverage the Representer Theorem to transform this infinite-dimensional optimization into an equivalent, computationally tractable finite-dimensional problem whose dimensionality scales with the trajectory data volume. We establish theoretical guarantees proving convergence to a local optimum with a local quadratic convergence rate. Empirical evaluations on a toy financial asset allocation problem validate these theoretical properties, while experiments on standard RL benchmarks demonstrate that Policy Newton in RKHS achieves superior convergence speed and higher episodic rewards compared to established first-order RKHS approaches and parametric second-order methods. Our work bridges a critical gap between non-parametric policy representations and second-order optimization methods in reinforcement learning.

Problem

Research questions and friction points this paper is trying to address.

Develops second-order optimization for RKHS-based RL policies

Avoids infinite-dimensional Hessian computation via cubic regularization

Transforms optimization to finite-dimensional problem using Representer Theorem

Innovation

Methods, ideas, or system contributions that make the work stand out.

Second-order optimization in RKHS for RL policies

Cubic regularized auxiliary objective avoids Hessian inversion

Finite-dimensional transform via Representer Theorem enables scalability

🔎 Similar Papers

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning