🤖 AI Summary
This work addresses the challenge in federated learning where multiple local updates under data heterogeneity often drive the global model toward sharp minima, degrading generalization. Existing sharpness-aware methods struggle to align local and global flatness. To this end, we propose FedNSAM, an algorithm that leverages global Nesterov momentum to guide local updates, constructing an estimated direction of global perturbation and performing extrapolation to harmonize local and global flatness. We introduce a novel “flatness distance” metric to quantify the inconsistency between local and global landscapes and establish a tighter convergence bound than FedSAM in our theoretical analysis. Empirical results demonstrate that FedNSAM significantly enhances both generalization performance and training efficiency across CNN and Transformer architectures, particularly in highly heterogeneous settings.
📝 Abstract
In federated learning (FL), multi-step local updates and data heterogeneity usually lead to sharper global minima, which degrades the performance of the global model. Popular FL algorithms integrate sharpness-aware minimization (SAM) into local training to address this issue. However, in the high data heterogeneity setting, the flatness in local training does not imply the flatness of the global model. Therefore, minimizing the sharpness of the local loss surfaces on the client data does not enable the effectiveness of SAM in FL to improve the generalization ability of the global model. We define the flatness distance to explain this phenomenon. By rethinking the SAM in FL and theoretically analyzing the flatness distance, we propose a novel FedNSAM algorithm that accelerates the SAM algorithm by introducing global Nesterov momentum into the local update to harmonize the consistency of global and local flatness. FedNSAM uses the global Nesterov momentum as the direction of local estimation of client global perturbations and extrapolation. Theoretically, we prove a tighter convergence bound than FedSAM by Nesterov extrapolation. Empirically, we conduct comprehensive experiments on CNN and Transformer models to verify the superior performance and efficiency of FedNSAM. The code is available at https://github.com/junkangLiu0/FedNSAM.