🤖 AI Summary
Existing convergence analyses of stochastic gradient descent (SGD) lack theoretical guarantees under non-i.i.d. data permutations—particularly those exhibiting strong inter-epoch dependencies—leaving critical gaps in understanding shuffle-based optimization.
Method: We propose the first unified theoretical framework, introducing a general assumption that quantifies permutation dependence strength, enabling rigorous analysis of four canonical shuffling strategies: random, cyclic, memory-augmented, and deterministic shuffling. We further extend this framework to federated learning, deriving convergence bounds under regularized client participation and device scheduling.
Contribution: Our work overcomes the fundamental limitation of prior unified analyses—which fail under dependent permutations—by establishing the first provably correct, broadly applicable convergence theory for shuffle-based SGD. It provides strict, distribution-agnostic convergence guarantees, significantly strengthening the theoretical foundation for permutation-aware optimization and enhancing the design rationale and interpretability of client ordering strategies in federated learning.
📝 Abstract
We aim to provide a unified convergence analysis for permutation-based Stochastic Gradient Descent (SGD), where data examples are permuted before each epoch. By examining the relations among permutations, we categorize existing permutation-based SGD algorithms into four categories: Arbitrary Permutations, Independent Permutations (including Random Reshuffling), One Permutation (including Incremental Gradient, Shuffle One and Nice Permutation) and Dependent Permutations (including GraBs Lu et al., 2022; Cooper et al., 2023). Existing unified analyses failed to encompass the Dependent Permutations category due to the inter-epoch dependencies in its permutations. In this work, we propose a general assumption that captures the inter-epoch permutation dependencies. Using the general assumption, we develop a unified framework for permutation-based SGD with arbitrary permutations of examples, incorporating all the aforementioned representative algorithms. Furthermore, we adapt our framework on example ordering in SGD for client ordering in Federated Learning (FL). Specifically, we develop a unified framework for regularized-participation FL with arbitrary permutations of clients.