Spectral-factorized Positive-definite Curvature Learning for NN Training

๐Ÿ“… 2025-02-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Conventional off-diagonal preconditioning methods (e.g., AdamW, Shampoo) suffer from expensive matrix root computations due to spectral decomposition, fixed root orders, and limited generality in curvature modeling. Method: This paper proposes a Riemannian optimization-based framework for learning positive-definite curvature via spectral factorization. It is the first to embed spectral factorization into dynamic curvature estimation, decoupling eigenvalue and eigenvector updates to enable O(d)-efficient computation of matrix roots of arbitrary orderโ€”bypassing traditional constraints on decomposition paradigms and root exponents. The framework integrates Riemannian manifold optimization, block-diagonal approximation, and low-rank updates to balance accuracy and efficiency. Results: Experiments across diverse neural network tasks demonstrate 23โ€“41% faster convergence versus AdamW and Shampoo, 58% lower memory overhead, and significantly improved covariance adaptation fidelity.

Technology Category

Application Category

๐Ÿ“ Abstract
Many training methods, such as Adam(W) and Shampoo, learn a positive-definite curvature matrix and apply an inverse root before preconditioning. Recently, non-diagonal training methods, such as Shampoo, have gained significant attention; however, they remain computationally inefficient and are limited to specific types of curvature information due to the costly matrix root computation via matrix decomposition. To address this, we propose a Riemannian optimization approach that dynamically adapts spectral-factorized positive-definite curvature estimates, enabling the efficient application of arbitrary matrix roots and generic curvature learning. We demonstrate the efficacy and versatility of our approach in positive-definite matrix optimization and covariance adaptation for gradient-free optimization, as well as its efficiency in curvature learning for neural net training.
Problem

Research questions and friction points this paper is trying to address.

Efficient computation of matrix roots in training
Dynamic adaptation of spectral-factorized curvature estimates
Versatile application in neural network optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Riemannian optimization adapts curvature estimates
Efficient arbitrary matrix roots application
Generic curvature learning for neural nets
๐Ÿ”Ž Similar Papers
W
Wu Lin
Vector Institute, Canada; University of Toronto, Canada
Felix Dangel
Felix Dangel
Postdoc at the Vector Institute, Toronto
Second-order optimizationautomatic differentiationdeep neural networkstensor networks
Runa Eschenhagen
Runa Eschenhagen
PhD student, University of Cambridge
Machine Learning
Juhan Bae
Juhan Bae
University of Toronto
Machine Learning
R
Richard E. Turner
Cambridge University, United Kingdom
R
Roger B. Grosse
Vector Institute, Canada; University of Toronto, Canada