Adjusted Shuffling SARAH: Advancing Complexity Analysis via Dynamic Gradient Weighting

📅 2025-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a theoretical bottleneck in shuffling-based variance-reduction methods for strongly convex stochastic optimization, aiming to close the gap between uniform sampling and shuffling in gradient complexity analysis. We propose a dynamic weighted gradient update scheme that integrates SARAH-type variance reduction with shuffling sampling, and design an Inexact variant that avoids full-batch gradient computations. For the first time, our method achieves the optimal gradient complexity $O(n + sqrt{n/varepsilon})$ for *any* shuffling order. The Inexact variant preserves linear convergence while reducing total computational complexity from $O(n/varepsilon)$ to significantly lower levels. The core innovation lies in unifying enhanced exploration capability with improved convergence efficiency, thereby establishing a tighter and more general theoretical foundation for shuffling-based algorithms.

Technology Category

Application Category

📝 Abstract
In this paper, we propose Adjusted Shuffling SARAH, a novel algorithm that integrates shuffling techniques with the well-known variance-reduced algorithm SARAH while dynamically adjusting the stochastic gradient weights in each update to enhance exploration. Our method achieves the best-known gradient complexity for shuffling variance reduction methods in a strongly convex setting. This result applies to any shuffling technique, which narrows the gap in the complexity analysis of variance reduction methods between uniform sampling and shuffling data. Furthermore, we introduce Inexact Adjusted Reshuffling SARAH, an inexact variant of Adjusted Shuffling SARAH that eliminates the need for full-batch gradient computations. This algorithm retains the same linear convergence rate as Adjusted Shuffling SARAH while showing an advantage in total complexity when the sample size is very large.
Problem

Research questions and friction points this paper is trying to address.

Enhance exploration via dynamic gradient weighting
Achieve best-known gradient complexity for shuffling methods
Eliminate full-batch gradient computations in large datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic gradient weighting enhances exploration
Best-known gradient complexity for shuffling
Inexact variant eliminates full-batch computations
🔎 Similar Papers
No similar papers found.