Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method

๐Ÿ“… 2026-04-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of traditional natural gradient methods, which suffer from high computational complexity and poor scalability to large neural networks, as well as the slow convergence and limited accuracy of first-order gradient approaches. The authors propose Sven, an optimization algorithm that leverages the natural decomposition of the loss function over individual samples and computes the minimum-norm parameter update via the Mooreโ€“Penrose pseudoinverse of the loss Jacobian. By employing truncated singular value decomposition, Sven efficiently approximates this update, extending natural gradient principles to over-parameterized settings while recovering the classical natural gradient in the under-parameterized limit. Requiring only k times the computational cost of stochastic gradient descent (SGD), Sven achieves substantially faster convergence and higher accuracy. Empirical results demonstrate its superior performance on regression tasks, matching the effectiveness of L-BFGS at a significantly lower computational cost.
๐Ÿ“ Abstract
We introduce Sven (Singular Value dEsceNt), a new optimization algorithm for neural networks that exploits the natural decomposition of loss functions into a sum over individual data points, rather than reducing the full loss to a single scalar before computing a parameter update. Sven treats each data point's residual as a separate condition to be satisfied simultaneously, using the Moore-Penrose pseudoinverse of the loss Jacobian to find the minimum-norm parameter update that best satisfies all conditions at once. In practice, this pseudoinverse is approximated via a truncated singular value decomposition, retaining only the $k$ most significant directions and incurring a computational overhead of only a factor of $k$ relative to stochastic gradient descent. This is in comparison to traditional natural gradient methods, which scale as the square of the number of parameters. We show that Sven can be understood as a natural gradient method generalized to the over-parametrized regime, recovering natural gradient descent in the under-parametrized limit. On regression tasks, Sven significantly outperforms standard first-order methods including Adam, converging faster and to a lower final loss, while remaining competitive with LBFGS at a fraction of the wall-time cost. We discuss the primary challenge to scaling, namely memory overhead, and propose mitigation strategies. Beyond standard machine learning benchmarks, we anticipate that Sven will find natural application in scientific computing settings where custom loss functions decompose into several conditions.
Problem

Research questions and friction points this paper is trying to address.

natural gradient
optimization
over-parameterized
computational efficiency
neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

natural gradient
singular value decomposition
over-parameterized optimization
Moore-Penrose pseudoinverse
loss decomposition
๐Ÿ”Ž Similar Papers
No similar papers found.
S
Samuel Bright-Thonney
Department of Physics, Massachusetts Institute of Technology
T
Thomas R. Harvey
Department of Physics, Massachusetts Institute of Technology
A
Andre Lukas
Rudolf Peierls Centre for Theoretical Physics, University of Oxford
Jesse Thaler
Jesse Thaler
MIT Physics
Theoretical Particle Physics