Gradient Compression May Hurt Generalization: A Remedy by Synthetic Data Guided Sharpness Aware Minimization

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the degradation of model generalization in federated learning under non-IID data when gradient compression is employed, as compression exacerbates the sharpness of the loss landscape. To mitigate this issue, the authors propose FedSynSAM, a framework that leverages synthetic data to facilitate sharpness-aware minimization (SAM) under compressed gradients. By enabling accurate estimation of global perturbations despite compression, FedSynSAM guides optimization toward flatter minima, thereby counteracting the adverse effects of both data heterogeneity and compression-induced perturbation bias. The method is theoretically shown to converge and, empirically, achieves significantly improved generalization performance while preserving communication efficiency.

Technology Category

Application Category

📝 Abstract
It is commonly believed that gradient compression in federated learning (FL) enjoys significant improvement in communication efficiency with negligible performance degradation. In this paper, we find that gradient compression induces sharper loss landscapes in federated learning, particularly under non-IID data distributions, which suggests hindered generalization capability. The recently emerging Sharpness Aware Minimization (SAM) effectively searches for a flat minima by incorporating a gradient ascent step (i.e., perturbing the model with gradients) before the celebrated stochastic gradient descent. Nonetheless, the direct application of SAM in FL suffers from inaccurate estimation of the global perturbation due to data heterogeneity. Existing approaches propose to utilize the model update from the previous communication round as a rough estimate. However, its effectiveness is hindered when model update compression is incorporated. In this paper, we propose FedSynSAM, which leverages the global model trajectory to construct synthetic data and facilitates an accurate estimation of the global perturbation. The convergence of the proposed algorithm is established, and extensive experiments are conducted to validate its effectiveness.
Problem

Research questions and friction points this paper is trying to address.

gradient compression
generalization
federated learning
non-IID
Sharpness Aware Minimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient Compression
Sharpness Aware Minimization
Federated Learning
Synthetic Data
Non-IID
🔎 Similar Papers
No similar papers found.