Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This paper addresses the convergence analysis of stochastic gradient descent with random reshuffling (SGD-RR) under the absence of Lipschitz smoothness—a key limitation of existing theory that restricts applicability to nonsmooth machine learning models. To overcome this, we propose a novel adaptive step-size scheme. Under only mild assumptions—namely, bounded gradient variance and continuity of the objective function (without requiring Lipschitz gradients)—our method establishes, for the first time, unified optimal convergence rates for SGD-RR across nonconvex, strongly convex, and non-strongly convex settings, applicable to both random and arbitrary reshuffling orders. The analysis breaks the conventional reliance on gradient smoothness, substantially broadening the theoretical scope and practical relevance of reshuffling-based optimization. Extensive experiments on standard machine learning tasks validate the algorithm’s effectiveness and robustness.

Technology Category

Application Category

📝 Abstract

Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance. Despite extensive development of convergence guarantees under various assumptions in recent years, most require the Lipschitz smoothness condition, which is often not met in common machine learning models. We highlight this issue with specific counterexamples. To address this gap, we revisit the convergence rates of shuffling-type gradient methods without assuming Lipschitz smoothness. Using our stepsize strategy, the shuffling-type gradient algorithm not only converges under weaker assumptions but also match the current best-known convergence rates, thereby broadening its applicability. We prove the convergence rates for nonconvex, strongly convex, and non-strongly convex cases, each under both random reshuffling and arbitrary shuffling schemes, under a general bounded variance condition. Numerical experiments further validate the performance of our shuffling-type gradient algorithm, underscoring its practical efficacy.

Problem

Research questions and friction points this paper is trying to address.

Analyzing shuffling gradient methods without Lipschitz smoothness

Proving convergence rates under weaker, more general assumptions

Validating algorithm performance across convex and nonconvex cases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Shuffling gradient methods without Lipschitz smoothness

Convergence under weaker assumptions with stepsize strategy

Validated performance in nonconvex and convex cases

🔎 Similar Papers

Distributed Random Reshuffling Methods with Improved Convergence