Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

📅 2024-04-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses the suboptimal convergence rates of Stochastic Accelerated Gradient Descent (SAGD) relative to standard SGD in convex and strongly convex optimization. We propose a generalized stochastic Nesterov acceleration framework. Methodologically, we introduce estimating sequences—a classical deterministic acceleration analysis tool—into the stochastic setting for the first time, modeling under interpolation conditions and the strong growth assumption. Theoretically, our contributions are threefold: (1) we reduce the dependence of the convergence rate on the growth constant ρ from ρ to √ρ, substantially mitigating the “acceleration worse than SGD” phenomenon; (2) we derive tighter convergence bounds that approach the rate of deterministic Nesterov acceleration for large condition numbers; and (3) the framework is broadly applicable to any stochastic gradient subroutine that guarantees sufficient expected progress. Empirically and theoretically, the method achieves genuine stochastic acceleration in the strongly convex regime.

Technology Category

Application Category

📝 Abstract

We prove new convergence rates for a generalized version of stochastic Nesterov acceleration under interpolation conditions. Unlike previous analyses, our approach accelerates any stochastic gradient method which makes sufficient progress in expectation. The proof, which proceeds using the estimating sequences framework, applies to both convex and strongly convex functions and is easily specialized to accelerated SGD under the strong growth condition. In this special case, our analysis reduces the dependence on the strong growth constant from $ ho$ to $sqrt{ ho}$ as compared to prior work. This improvement is comparable to a square-root of the condition number in the worst case and address criticism that guarantees for stochastic acceleration could be worse than those for SGD.

Problem

Research questions and friction points this paper is trying to address.

Randomized Accelerated Gradient Descent

Convex Functions

Efficiency Improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interpolated Stochastic Gradient Descent

Condition Number Reduction

Enhanced Learning Efficiency

🔎 Similar Papers

No similar papers found.