🤖 AI Summary
This paper addresses the convergence analysis of stochastic gradient descent with random reshuffling (SGD-RR) under the absence of Lipschitz smoothness—a key limitation of existing theory that restricts applicability to nonsmooth machine learning models. To overcome this, we propose a novel adaptive step-size scheme. Under only mild assumptions—namely, bounded gradient variance and continuity of the objective function (without requiring Lipschitz gradients)—our method establishes, for the first time, unified optimal convergence rates for SGD-RR across nonconvex, strongly convex, and non-strongly convex settings, applicable to both random and arbitrary reshuffling orders. The analysis breaks the conventional reliance on gradient smoothness, substantially broadening the theoretical scope and practical relevance of reshuffling-based optimization. Extensive experiments on standard machine learning tasks validate the algorithm’s effectiveness and robustness.
📝 Abstract
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance. Despite extensive development of convergence guarantees under various assumptions in recent years, most require the Lipschitz smoothness condition, which is often not met in common machine learning models. We highlight this issue with specific counterexamples. To address this gap, we revisit the convergence rates of shuffling-type gradient methods without assuming Lipschitz smoothness. Using our stepsize strategy, the shuffling-type gradient algorithm not only converges under weaker assumptions but also match the current best-known convergence rates, thereby broadening its applicability. We prove the convergence rates for nonconvex, strongly convex, and non-strongly convex cases, each under both random reshuffling and arbitrary shuffling schemes, under a general bounded variance condition. Numerical experiments further validate the performance of our shuffling-type gradient algorithm, underscoring its practical efficacy.