🤖 AI Summary
To address the slow convergence and low accuracy of FedAvg in LoRA-based federated learning under data heterogeneity, this paper proposes a novel parameter aggregation method grounded in Robust Principal Component Analysis (Robust-PCA). It is the first to incorporate Robust-PCA into LoRA federated aggregation, decomposing client updates into shared low-rank components and personalized sparse components. A dual-path aggregation mechanism—“low-rank averaging + scaled averaging of sparse components”—is designed to overcome the failure of Task Arithmetic under highly similar client updates. Evaluated on multimodal vision-language tasks, the method achieves significant improvements in final accuracy (+1.8–3.2%) and accelerates convergence by up to 2.1×, consistently outperforming baselines including FedAvg, FedPer, and LoRA-MoE.
📝 Abstract
LoRA has emerged as one of the most promising fine-tuning techniques, especially for federated learning (FL), since it significantly reduces communication and computation costs at resource-constrained clients. However, data heterogeneity remains a significant challenge for LoRA-based FL, and the conventional aggregation strategy based on FedAvg suffers from slow convergence and suboptimal accuracy. Motivated by recent advances in model merging, particularly Task Arithmetic, we explore the idea of aggregating client LoRA parameters using scaled averaging. We first observe that a naive application of Task Arithmetic is ineffective due to the high cosine similarity between client updates, indicating significant common knowledge in the updates across clients. To address this issue, we propose decomposing client LoRA updates via Robust Principal Component Analysis (Robust-PCA) into a common low-rank component and client-specific sparse components. Our proposed algorithm FedRPCA aggregates the low-rank components through averaging, consolidating common knowledge, and applies scaled averaging to the sparse components to amplify client-specific knowledge. We evaluate our approach across a variety of vision and language tasks and demonstrate that it achieves higher final accuracy and faster convergence compared to competing baselines.