🤖 AI Summary
This work addresses the lack of theoretical convergence guarantees for subsampled natural gradient descent (SNGD) and its accelerated variant SPRING in strongly convex quadratic optimization. We establish the first rigorous framework for fast convergence analysis of both algorithms under least-squares and general strongly convex quadratic losses. By uncovering an exact equivalence between SNGD and the regularized Kaczmarz method—and leveraging tools from randomized linear algebra—we prove that SNGD achieves linear convergence under mild conditions. Moreover, we provide the first formal convergence guarantee for SPRING and theoretically validate its acceleration over SNGD. Our analysis bridges a fundamental theoretical gap for subsampled natural gradient–type methods in classical quadratic models, offering new insights into their empirical efficiency and laying a solid foundation for further theoretical development.
📝 Abstract
Subsampled natural gradient descent (SNGD) has shown impressive results for parametric optimization tasks in scientific machine learning, such as neural network wavefunctions and physics-informed neural networks, but it has lacked a theoretical explanation. We address this gap by analyzing the convergence of SNGD and its accelerated variant, SPRING, for idealized parametric optimization problems where the model is linear and the loss function is strongly convex and quadratic. In the special case of a least-squares loss, namely the standard linear least-squares problem, we prove that SNGD is equivalent to a regularized Kaczmarz method while SPRING is equivalent to an accelerated regularized Kaczmarz method. As a result, by leveraging existing analyses we obtain under mild conditions (i) the first fast convergence rate for SNGD, (ii) the first convergence guarantee for SPRING in any setting, and (iii) the first proof that SPRING can accelerate SNGD. In the case of a general strongly convex quadratic loss, we extend the analysis of the regularized Kaczmarz method to obtain a fast convergence rate for SNGD under stronger conditions, providing the first explanation for the effectiveness of SNGD outside of the least-squares setting. Overall, our results illustrate how tools from randomized linear algebra can shed new light on the interplay between subsampling and curvature-aware optimization strategies.