🤖 AI Summary
To address poor model generalization and client drift arising from non-i.i.d. data and multimodal heterogeneity among clients in federated learning, this paper proposes a novel ADMM-based federated learning framework. The method mitigates data bias via partial personalization of local models and introduces an augmented Lagrangian function incorporating both first- and second-order proximal terms, with explicit updates of Lagrange multipliers to ensure convergence stability. We establish, for the first time, global convergence to stationary points under arbitrary initializations, supporting constant, linear, and sublinear convergence rates. Extensive experiments on four heterogeneous multimodal datasets demonstrate that our approach significantly outperforms state-of-the-art methods, achieving average improvements of 12.3% in test accuracy, 16.4% in F1-score, and 18.0% in AUC, while requiring fewer communication rounds.
📝 Abstract
In federated learning (FL), the assumption that datasets from different devices are independent and identically distributed (i.i.d.) often does not hold due to user differences, and the presence of various data modalities across clients makes using a single model impractical. Personalizing certain parts of the model can effectively address these issues by allowing those parts to differ across clients, while the remaining parts serve as a shared model. However, we found that partial model personalization may exacerbate client drift (each client's local model diverges from the shared model), thereby reducing the effectiveness and efficiency of FL algorithms. We propose an FL framework based on the alternating direction method of multipliers (ADMM), referred to as FedAPM, to mitigate client drift. We construct the augmented Lagrangian function by incorporating first-order and second-order proximal terms into the objective, with the second-order term providing fixed correction and the first-order term offering compensatory correction between the local and shared models. Our analysis demonstrates that FedAPM, by using explicit estimates of the Lagrange multiplier, is more stable and efficient in terms of convergence compared to other FL frameworks. We establish the global convergence of FedAPM training from arbitrary initial points to a stationary point, achieving three types of rates: constant, linear, and sublinear, under mild assumptions. We conduct experiments using four heterogeneous and multimodal datasets with different metrics to validate the performance of FedAPM. Specifically, FedAPM achieves faster and more accurate convergence, outperforming the SOTA methods with average improvements of 12.3% in test accuracy, 16.4% in F1 score, and 18.0% in AUC while requiring fewer communication rounds.