π€ AI Summary
Federated LoRA fine-tuning based on FedAvg suffers from cross-client interference during model aggregation and struggles to balance personalization with global knowledge acquisition.
Method: This paper proposes a novel federated LoRA paradigm that eliminates the need for global initialization. It introduces (1) βRest-of-the-World LoRAβ, a selective adapter-sharing mechanism enabling low-rank adapter exchange among clients, and (2) a MoE-based adaptive mixer that dynamically gates individual adapter updates and global knowledge integration via learnable weights.
Contribution/Results: The method preserves decentralized local training, computational efficiency, and privacy guarantees. Evaluated on NLP benchmarks, it significantly outperforms existing federated LoRA approaches, achieving state-of-the-art performance on local tasks while maintaining robust global model convergence.
π Abstract
Fine-tuning large language models (LLMs) in federated settings enables privacy-preserving adaptation but suffers from cross-client interference due to model aggregation. Existing federated LoRA fine-tuning methods, primarily based on FedAvg, struggle with data heterogeneity, leading to harmful cross-client interference and suboptimal personalization. In this work, we propose extbf{FedALT}, a novel personalized federated LoRA fine-tuning algorithm that fundamentally departs from FedAvg. Instead of using an aggregated model to initialize local training, each client continues training its individual LoRA while incorporating shared knowledge through a separate Rest-of-the-World (RoTW) LoRA component. To effectively balance local adaptation and global information, FedALT introduces an adaptive mixer that dynamically learns input-specific weightings between the individual and RoTW LoRA components using the Mixture-of-Experts (MoE) principle. Through extensive experiments on NLP benchmarks, we demonstrate that FedALT significantly outperforms state-of-the-art personalized federated LoRA fine-tuning methods, achieving superior local adaptation without sacrificing computational efficiency.