🤖 AI Summary
To address synchronization challenges in decentralized federated learning arising from heterogeneous and dynamically varying client computational and communication capabilities, this paper proposes a novel framework supporting sparse and non-uniform local updates and model exchanges. It introduces “sparsity” as a unifying abstraction for both gradient updates and model exchanges, enabling robustness to time-varying heterogeneity. The work establishes the first convergence theory accommodating both convex and non-convex objectives, as well as constant or decaying learning rates—subsuming several classical algorithms as special cases. A decentralized optimization algorithm is designed using random indicator variables, integrating graph connectivity analysis, modeling of non-i.i.d. heterogeneous data, and gradient noise control. Theoretical analysis guarantees convergence rates matching state-of-the-art bounds, while extensive experiments demonstrate consistent training speedup over baseline methods across diverse system configurations.
📝 Abstract
Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabilities. In this work, we propose Decentralized Sporadic Federated Learning ($ exttt{DSpodFL}$), a DFL methodology built on a generalized notion of $ extit{sporadicity}$ in both local gradient and aggregation processes. $ exttt{DSpodFL}$ subsumes many existing decentralized optimization methods under a unified algorithmic framework by modeling the per-iteration (i) occurrence of gradient descent at each client and (ii) exchange of models between client pairs as arbitrary indicator random variables, thus capturing $ extit{heterogeneous and time-varying}$ computation/communication scenarios. We analytically characterize the convergence behavior of $ exttt{DSpodFL}$ for both convex and non-convex models and for both constant and diminishing learning rates, under mild assumptions on the communication graph connectivity, data heterogeneity across clients, and gradient noises. We show how our bounds recover existing results from decentralized gradient descent as special cases. Experiments demonstrate that $ exttt{DSpodFL}$ consistently achieves improved training speeds compared with baselines under various system settings.