On the Limits of Momentum in Decentralized and Federated Optimization

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work investigates the convergence limitations of momentum-based methods in decentralized and federated learning under unbounded statistical heterogeneity and cyclic client participation. Method: Through rigorous theoretical analysis and numerical validation, we establish—under mild assumptions—that momentum fails to overcome the fundamental bias induced by statistical heterogeneity. Specifically, when the step size decays faster than Θ(1/t), the limiting point depends on both the initial iterate and the heterogeneity bound, precluding asymptotic unbiasedness. Results: We provide the first formal proof that momentum cannot eliminate heterogeneity-induced bias in such settings, even with careful tuning. Empirical evaluation within a distributed SGD framework corroborates the theoretical findings. This work reveals an intrinsic limitation of momentum in heterogeneous environments and establishes critical theoretical boundaries and design principles for developing robust optimization algorithms in decentralized and federated learning.

Technology Category

Application Category

📝 Abstract

Recent works have explored the use of momentum in local methods to enhance distributed SGD. This is particularly appealing in Federated Learning (FL), where momentum intuitively appears as a solution to mitigate the effects of statistical heterogeneity. Despite recent progress in this direction, it is still unclear if momentum can guarantee convergence under unbounded heterogeneity in decentralized scenarios, where only some workers participate at each round. In this work we analyze momentum under cyclic client participation, and theoretically prove that it remains inevitably affected by statistical heterogeneity. Similarly to SGD, we prove that decreasing step-sizes do not help either: in fact, any schedule decreasing faster than $Θleft(1/t ight)$ leads to convergence to a constant value that depends on the initialization and the heterogeneity bound. Numerical results corroborate the theory, and deep learning experiments confirm its relevance for realistic settings.

Problem

Research questions and friction points this paper is trying to address.

Analyzing momentum's convergence limits under unbounded statistical heterogeneity

Proving momentum remains affected by heterogeneity despite decreasing step-sizes

Demonstrating momentum fails to guarantee convergence in decentralized federated learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzed momentum under cyclic client participation

Proved momentum affected by statistical heterogeneity

Showed decreasing step-sizes hinder convergence

🔎 Similar Papers

Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum