🤖 AI Summary
This work addresses the cubic time and quadratic memory bottlenecks inherent in conventional methods for solving linear systems involving the Hessian matrix of deep neural networks. The authors propose an exact and efficient algorithm that directly computes the product of the inverse Hessian with any given vector, without explicitly constructing or storing either the Hessian or its inverse. Leveraging a recursive structure, the method is particularly well-suited for “tall-and-skinny” networks—those with many layers but few parameters per layer—and achieves both time and memory complexity linear in the network depth while preserving numerical accuracy. To the best of the authors’ knowledge, this is the first approach to enable linear-complexity computation of inverse-Hessian-vector products for deep networks, offering a significant improvement over existing techniques.
📝 Abstract
We describe an exact algorithm to solve linear systems of the form $Hx=b$ where $H$ is the Hessian of a deep net. The method computes Hessian-inverse-vector products without storing the Hessian or its inverse. It requires time and storage that scale linearly in the number of layers. This is in contrast to the naive approach of first computing the Hessian, then solving the linear system, which takes storage and time that are respectively quadratic and cubic in the number of layers. The Hessian-inverse-vector product method scales roughly like Pearlmutter's algorithm for computing Hessian-vector products.