Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization

📅 2024-12-03

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

217K/year

🤖 AI Summary

To address the practical challenge in federated learning where non-convex objectives often violate standard smoothness assumptions, this paper proposes the first distributed optimization algorithm supporting local updates, client-level random reshuffling, and partial client participation under the more general $(L_0, L_1)$-smoothness condition. Methodologically, it innovatively integrates gradient clipping, co-designed step sizes, and random reshuffling—without requiring restrictive assumptions such as bounded gradient variance or global Lipschitz continuity. Theoretically, it establishes the first convergence guarantee under the Polyak–Łojasiewicz (PL) condition, with guarantees that naturally recover standard smoothness results as a special case. Empirically, the algorithm demonstrates significantly improved stability and faster convergence on non-convex federated learning tasks, substantially outperforming mainstream baselines.

Technology Category

Application Category

📝 Abstract

Non-convex Machine Learning problems typically do not adhere to the standard smoothness assumption. Based on empirical findings, Zhang et al. (2020b) proposed a more realistic generalized $(L_0, L_1)$-smoothness assumption, though it remains largely unexplored. Many existing algorithms designed for standard smooth problems need to be revised. However, in the context of Federated Learning, only a few works address this problem but rely on additional limiting assumptions. In this paper, we address this gap in the literature: we propose and analyze new methods with local steps, partial participation of clients, and Random Reshuffling without extra restrictive assumptions beyond generalized smoothness. The proposed methods are based on the proper interplay between clients' and server's stepsizes and gradient clipping. Furthermore, we perform the first analysis of these methods under the Polyak-{L} ojasiewicz condition. Our theory is consistent with the known results for standard smooth problems, and our experimental results support the theoretical insights.

Problem

Research questions and friction points this paper is trying to address.

Address non-convex optimization under generalized smoothness assumptions

Develop federated learning methods without restrictive assumptions

Analyze methods under Polyak-Łojasiewicz condition for convergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Local steps and partial client participation

Random Reshuffling without extra assumptions

Gradient clipping and stepsize interplay

🔎 Similar Papers

No similar papers found.