๐ค AI Summary
To address the challenges of degraded sparse training performance due to data heterogeneity, non-adaptive masking strategies, and complex hyperparameter tuning in cross-device federated learning (FL), this paper proposes FedSparseโthe first practical federated sparse training framework. FedSparse introduces dynamic mask regeneration and client-local sparse pattern adaptation to jointly achieve model consensus, data-aware masking, and single-hyperparameter-controllable optimization. Without additional tuning, it maintains near-lossless accuracy at 95% sparsity using only one hyperparameter. Weight regeneration overhead is reduced to just 0.5% of baseline methods. Under severe data heterogeneity, FedSparse consistently outperforms all existing sparse FL approaches, while simultaneously ensuring high communication efficiency, low computational overhead, and robustness across diverse client distributions.
๐ Abstract
Sparse training is often adopted in cross-device federated learning (FL) environments where constrained devices collaboratively train a machine learning model on private data by exchanging pseudo-gradients across heterogeneous networks. Although sparse training methods can reduce communication overhead and computational burden in FL, they are often not used in practice for the following key reasons: (1) data heterogeneity makes it harder for clients to reach consensus on sparse models compared to dense ones, requiring longer training; (2) methods for obtaining sparse masks lack adaptivity to accommodate very heterogeneous data distributions, crucial in cross-device FL; and (3) additional hyperparameters are required, which are notably challenging to tune in FL. This paper presents SparsyFed, a practical federated sparse training method that critically addresses the problems above. Previous works have only solved one or two of these challenges at the expense of introducing new trade-offs, such as clients' consensus on masks versus sparsity pattern adaptivity. We show that SparsyFed simultaneously (1) can produce 95% sparse models, with negligible degradation in accuracy, while only needing a single hyperparameter, (2) achieves a per-round weight regrowth 200 times smaller than previous methods, and (3) allows the sparse masks to adapt to highly heterogeneous data distributions and outperform all baselines under such conditions.