π€ AI Summary
This work addresses the limited understanding of generalization in variance-reduced optimization methods, which have been predominantly analyzed through the lens of convergence. Focusing on algorithmic stability, the paper establishes sharp, data-dependent generalization error bounds for SVRG under both convex and strongly convex settingsβthe first such guarantees to date. The key innovation lies in decomposing the SVRG update into an SGD-like step and a zero-mean correction term, and in designing a novel Lyapunov function to handle the additional gradient terms introduced by the reference point. This framework yields optimal overall risk bounds in both settings and extends seamlessly to other variance-reduction algorithms like SAGA, thereby uncovering fundamental connections between optimization dynamics and generalization performance.
π Abstract
Variance reduction (VR) methods employ stochastic gradients with decreasing variance, and they have been widely applied to solve large-scale optimization problems in machine learning because of their efficiency. Existing theoretical studies of VR methods are mainly focused on the convergence analysis, leaving the generalization behavior largely unexplored. In this paper, we bridge this gap by developing the first non-vacuous generalization analysis of the representative VR method: Stochastic Variance Reduced Gradient (SVRG), through the lens of algorithmic stability. In particular, we establish sharp stability bounds of the SVRG in both convex and strongly convex settings by exploiting its algorithmic structure. The obtained bounds are data-dependent, because the training errors are incorporated along the trajectory. Our analysis clarifies the interplay between optimization and generalization, leading to optimal excess population risk bounds in both settings. Our approach differs substantially from existing analyses of stochastic algorithms in the sense that we decompose the SVRG update as an SGD-like step plus a zero-mean correction term and then introduce novel Lyapunov functions to absorb the additional gradient terms induced by the reference points. Our analytical framework can be generalized to other VR methods, and we demonstrate the generalization by the well-known Stochastic Average Gradient Accelerated (SAGA) method.