Problem
Research questions and friction points this paper is trying to address.
Extends deterministic policy gradient to exploit performance curvature information
Develops quadratic critic preserving true gradient and Hessian approximation
Enables quasi-Newton actor updates for faster convergence than first-order methods
Innovation
Methods, ideas, or system contributions that make the work stand out.
Second-order actor-critic framework using curvature information
Quadratic critic preserving policy gradient and Hessian approximation
Quasi-Newton actor update enabled by critic-learned information