FedSWA: Improving Generalization in Federated Learning with Highly Heterogeneous Data via Momentum-Based Stochastic Controlled Weight Averaging

📅 2025-07-26

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address the degradation of generalization performance in federated learning under highly non-IID data—where state-of-the-art methods such as FedSAM even underperform FedAvg—this paper proposes FedMoSWA, a novel federated aggregation algorithm integrating Stochastic Weight Averaging (SWA), momentum-based averaging, and control variates. Its core innovation lies in momentum-guided local model weight averaging, which explicitly steers optimization toward flatter regions of the global loss landscape, thereby enhancing generalization in heterogeneous client environments. Theoretical analysis demonstrates that FedMoSWA achieves tighter bounds on both optimization error and generalization error compared to FedSAM and other baselines. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny ImageNet confirm substantial improvements in test accuracy and robustness across diverse non-IID settings. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

For federated learning (FL) algorithms such as FedSAM, their generalization capability is crucial for real-word applications. In this paper, we revisit the generalization problem in FL and investigate the impact of data heterogeneity on FL generalization. We find that FedSAM usually performs worse than FedAvg in the case of highly heterogeneous data, and thus propose a novel and effective federated learning algorithm with Stochastic Weight Averaging (called exttt{FedSWA}), which aims to find flatter minima in the setting of highly heterogeneous data. Moreover, we introduce a new momentum-based stochastic controlled weight averaging FL algorithm ( exttt{FedMoSWA}), which is designed to better align local and global models. Theoretically, we provide both convergence analysis and generalization bounds for exttt{FedSWA} and exttt{FedMoSWA}. We also prove that the optimization and generalization errors of exttt{FedMoSWA} are smaller than those of their counterparts, including FedSAM and its variants. Empirically, experimental results on CIFAR10/100 and Tiny ImageNet demonstrate the superiority of the proposed algorithms compared to their counterparts. Open source code at: https://github.com/junkangLiu0/FedSWA.

Problem

Research questions and friction points this paper is trying to address.

Improving generalization in federated learning with heterogeneous data

Addressing performance decline of FedSAM in highly heterogeneous data

Aligning local and global models via momentum-based weight averaging

Innovation

Methods, ideas, or system contributions that make the work stand out.

Momentum-based stochastic controlled weight averaging

Finds flatter minima for heterogeneous data

Better aligns local and global models

🔎 Similar Papers

Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum