SparsyFed: Sparse Adaptive Federated Training

๐Ÿ“… 2025-04-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenges of degraded sparse training performance due to data heterogeneity, non-adaptive masking strategies, and complex hyperparameter tuning in cross-device federated learning (FL), this paper proposes FedSparseโ€”the first practical federated sparse training framework. FedSparse introduces dynamic mask regeneration and client-local sparse pattern adaptation to jointly achieve model consensus, data-aware masking, and single-hyperparameter-controllable optimization. Without additional tuning, it maintains near-lossless accuracy at 95% sparsity using only one hyperparameter. Weight regeneration overhead is reduced to just 0.5% of baseline methods. Under severe data heterogeneity, FedSparse consistently outperforms all existing sparse FL approaches, while simultaneously ensuring high communication efficiency, low computational overhead, and robustness across diverse client distributions.

Technology Category

Application Category

๐Ÿ“ Abstract
Sparse training is often adopted in cross-device federated learning (FL) environments where constrained devices collaboratively train a machine learning model on private data by exchanging pseudo-gradients across heterogeneous networks. Although sparse training methods can reduce communication overhead and computational burden in FL, they are often not used in practice for the following key reasons: (1) data heterogeneity makes it harder for clients to reach consensus on sparse models compared to dense ones, requiring longer training; (2) methods for obtaining sparse masks lack adaptivity to accommodate very heterogeneous data distributions, crucial in cross-device FL; and (3) additional hyperparameters are required, which are notably challenging to tune in FL. This paper presents SparsyFed, a practical federated sparse training method that critically addresses the problems above. Previous works have only solved one or two of these challenges at the expense of introducing new trade-offs, such as clients' consensus on masks versus sparsity pattern adaptivity. We show that SparsyFed simultaneously (1) can produce 95% sparse models, with negligible degradation in accuracy, while only needing a single hyperparameter, (2) achieves a per-round weight regrowth 200 times smaller than previous methods, and (3) allows the sparse masks to adapt to highly heterogeneous data distributions and outperform all baselines under such conditions.
Problem

Research questions and friction points this paper is trying to address.

Addresses data heterogeneity in federated sparse training
Improves adaptivity of sparse masks to heterogeneous data
Reduces hyperparameter tuning complexity in federated learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive sparse masks for heterogeneous data
Single hyperparameter for 95% sparsity
Minimal weight regrowth per training round
๐Ÿ”Ž Similar Papers
No similar papers found.