Convergence Analysis of alpha-SVRG under Strong Convexity

📅 2025-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the convergence behavior of the α-SVRG algorithm for strongly convex objective functions. We develop a unified analytical framework that yields, for the first time, an explicit, parameterized convergence rate expression under a fixed step size, revealing how the convergence speed varies continuously with the noise-tuning parameter α. Theoretically, we prove the existence of an optimal α such that α-SVRG strictly outperforms both standard SGD and classical SVRG in convergence rate. Moreover, we provide the first theoretical confirmation that moderate stochastic noise injection—controlled by α—accelerates convergence (“noise-beneficial effect”). Numerical experiments on linear regression corroborate the theoretically predicted acceleration. Our core contribution is the establishment of the first unified convergence analysis for α-SVRG, precisely characterizing the conditions and mechanisms under which it surpasses classical variance-reduction methods.

Technology Category

Application Category

📝 Abstract
Stochastic first-order methods for empirical risk minimization employ gradient approximations based on sampled data in lieu of exact gradients. Such constructions introduce noise into the learning dynamics, which can be corrected through variance-reduction techniques. There is increasing evidence in the literature that in many modern learning applications noise can have a beneficial effect on optimization and generalization. To this end, the recently proposed variance-reduction technique, alpha-SVRG [Yin et al., 2023] allows for fine-grained control of the level of residual noise in the learning dynamics, and has been reported to empirically outperform both SGD and SVRG in modern deep learning scenarios. By focusing on strongly convex environments, we first provide a unified convergence rate expression for alpha-SVRG under fixed learning rate, which reduces to that of either SGD or SVRG by setting alpha=0 or alpha=1, respectively. We show that alpha-SVRG has faster convergence rate compared to SGD and SVRG under suitable choice of alpha. Simulation results on linear regression validate our theory.
Problem

Research questions and friction points this paper is trying to address.

Analyzes convergence of alpha-SVRG in strongly convex settings.
Compares alpha-SVRG performance with SGD and SVRG.
Validates theory with linear regression simulations.
Innovation

Methods, ideas, or system contributions that make the work stand out.

alpha-SVRG controls residual noise finely.
alpha-SVRG outperforms SGD and SVRG.
Unified convergence rate for alpha-SVRG.
🔎 Similar Papers
No similar papers found.
S
Sean Xiao
Department of Electrical Engineering, Imperial College London
S
Sangwoo Park
Department of Electrical Engineering, Imperial College London
Stefan Vlaski
Stefan Vlaski
Imperial College London
Distributed OptimizationMachine LearningStatistical Signal ProcessingMulti-Agent Systems