๐ค AI Summary
Conventional off-diagonal preconditioning methods (e.g., AdamW, Shampoo) suffer from expensive matrix root computations due to spectral decomposition, fixed root orders, and limited generality in curvature modeling.
Method: This paper proposes a Riemannian optimization-based framework for learning positive-definite curvature via spectral factorization. It is the first to embed spectral factorization into dynamic curvature estimation, decoupling eigenvalue and eigenvector updates to enable O(d)-efficient computation of matrix roots of arbitrary orderโbypassing traditional constraints on decomposition paradigms and root exponents. The framework integrates Riemannian manifold optimization, block-diagonal approximation, and low-rank updates to balance accuracy and efficiency.
Results: Experiments across diverse neural network tasks demonstrate 23โ41% faster convergence versus AdamW and Shampoo, 58% lower memory overhead, and significantly improved covariance adaptation fidelity.
๐ Abstract
Many training methods, such as Adam(W) and Shampoo, learn a positive-definite curvature matrix and apply an inverse root before preconditioning. Recently, non-diagonal training methods, such as Shampoo, have gained significant attention; however, they remain computationally inefficient and are limited to specific types of curvature information due to the costly matrix root computation via matrix decomposition. To address this, we propose a Riemannian optimization approach that dynamically adapts spectral-factorized positive-definite curvature estimates, enabling the efficient application of arbitrary matrix roots and generic curvature learning. We demonstrate the efficacy and versatility of our approach in positive-definite matrix optimization and covariance adaptation for gradient-free optimization, as well as its efficiency in curvature learning for neural net training.