Quasi-Newton Compatible Actor-Critic for Deterministic Policies

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the slow convergence and suboptimal performance of Deterministic Policy Gradient (DPG) methods—stemming from their lack of curvature information—this paper proposes the Second-Order Deterministic Policy Gradient (2nd-DPG) framework. The core methodological innovation is a quadratic critic architecture compatible with function approximation, which explicitly models the Hessian structure of the performance objective. This enables joint approximation of the policy gradient and Hessian without requiring second-order backpropagation, thereby facilitating quasi-Newton-style policy updates. Critic parameters are efficiently estimated via Least-Squares Temporal Difference (LSTD) learning. The algorithm applies to any differentiable deterministic policy. Empirical evaluation across multiple continuous-control benchmarks demonstrates that 2nd-DPG significantly improves both convergence speed and final policy performance over standard Actor-Critic baselines.

Technology Category

Application Category

📝 Abstract
In this paper, we propose a second-order deterministic actor-critic framework in reinforcement learning that extends the classical deterministic policy gradient method to exploit curvature information of the performance function. Building on the concept of compatible function approximation for the critic, we introduce a quadratic critic that simultaneously preserves the true policy gradient and an approximation of the performance Hessian. A least-squares temporal difference learning scheme is then developed to estimate the quadratic critic parameters efficiently. This construction enables a quasi-Newton actor update using information learned by the critic, yielding faster convergence compared to first-order methods. The proposed approach is general and applicable to any differentiable policy class. Numerical examples demonstrate that the method achieves improved convergence and performance over standard deterministic actor-critic baselines.
Problem

Research questions and friction points this paper is trying to address.

Extends deterministic policy gradient to exploit performance curvature information
Develops quadratic critic preserving true gradient and Hessian approximation
Enables quasi-Newton actor updates for faster convergence than first-order methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Second-order actor-critic framework using curvature information
Quadratic critic preserving policy gradient and Hessian approximation
Quasi-Newton actor update enabled by critic-learned information
🔎 Similar Papers
No similar papers found.