🤖 AI Summary
This work addresses the long-standing challenge of convergence analysis for AdaGrad-type adaptive optimization algorithms in stochastic gradient descent (SGD). We propose the first unified theoretical framework under anisotropic matrix smoothness and gradient noise assumptions, systematically characterizing the synergistic acceleration between diagonal preconditioning and Nesterov momentum. Our analysis proves that momentum substantially improves the convergence rate of AdaGrad-type methods, yielding the first rigorous convergence guarantee for Diagonal Adaptive Stochastic Gradient Optimization (DASGO). The framework recovers optimal known bounds for Adam, AdaGrad, and RMSProp as special cases. Crucially, our analysis reveals the complementary nature of momentum and adaptive preconditioning—providing the first theoretical explanation for the empirical efficacy of Adam and related methods. Under standard assumptions, our results achieve a faster theoretical convergence rate than all prior adaptive methods.
📝 Abstract
In this paper, we revisit stochastic gradient descent (SGD) with AdaGrad-type preconditioning. Our contributions are twofold. First, we develop a unified convergence analysis of SGD with adaptive preconditioning under anisotropic or matrix smoothness and noise assumptions. This allows us to recover state-of-the-art convergence results for several popular adaptive gradient methods, including AdaGrad-Norm, AdaGrad, and ASGO/One-sided Shampoo. In addition, we establish the fundamental connection between two recently proposed algorithms, Scion and DASGO, and provide the first theoretical guarantees for the latter. Second, we show that the convergence of methods like AdaGrad and DASGO can be provably accelerated beyond the best-known rates using Nesterov momentum. Consequently, we obtain the first theoretical justification that AdaGrad-type algorithms can simultaneously benefit from both diagonal preconditioning and momentum, which may provide an ultimate explanation for the practical efficiency of Adam.