Spectral-factorized Positive-definite Curvature Learning for NN Training

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Conventional off-diagonal preconditioning methods (e.g., AdamW, Shampoo) suffer from expensive matrix root computations due to spectral decomposition, fixed root orders, and limited generality in curvature modeling. Method: This paper proposes a Riemannian optimization-based framework for learning positive-definite curvature via spectral factorization. It is the first to embed spectral factorization into dynamic curvature estimation, decoupling eigenvalue and eigenvector updates to enable O(d)-efficient computation of matrix roots of arbitrary order—bypassing traditional constraints on decomposition paradigms and root exponents. The framework integrates Riemannian manifold optimization, block-diagonal approximation, and low-rank updates to balance accuracy and efficiency. Results: Experiments across diverse neural network tasks demonstrate 23–41% faster convergence versus AdamW and Shampoo, 58% lower memory overhead, and significantly improved covariance adaptation fidelity.

Technology Category

Application Category

📝 Abstract

Many training methods, such as Adam(W) and Shampoo, learn a positive-definite curvature matrix and apply an inverse root before preconditioning. Recently, non-diagonal training methods, such as Shampoo, have gained significant attention; however, they remain computationally inefficient and are limited to specific types of curvature information due to the costly matrix root computation via matrix decomposition. To address this, we propose a Riemannian optimization approach that dynamically adapts spectral-factorized positive-definite curvature estimates, enabling the efficient application of arbitrary matrix roots and generic curvature learning. We demonstrate the efficacy and versatility of our approach in positive-definite matrix optimization and covariance adaptation for gradient-free optimization, as well as its efficiency in curvature learning for neural net training.

Problem

Research questions and friction points this paper is trying to address.

Efficient computation of matrix roots in training

Dynamic adaptation of spectral-factorized curvature estimates

Versatile application in neural network optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Riemannian optimization adapts curvature estimates

Efficient arbitrary matrix roots application

Generic curvature learning for neural nets

🔎 Similar Papers

Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure