🤖 AI Summary
This work addresses the gradient collapse issue in high-rank LoRA during federated fine-tuning of large language models, which arises from amplified statistical variance due to multi-client aggregation. Existing scaling strategies overlook the interaction between federated aggregation and adapter rank. To resolve this, we propose SFed-LoRA, a novel framework that theoretically characterizes the impact of federated aggregation on LoRA rank for the first time and derives an optimal scaling factor dependent on both the number of clients and the adapter rank. This scaling effectively corrects aggregation-induced errors and stabilizes training without modifying model architecture or increasing inference overhead. Extensive experiments demonstrate that SFed-LoRA significantly improves convergence speed and stability of high-rank LoRA across diverse tasks, models, and heterogeneous data settings, outperforming current state-of-the-art baselines.
📝 Abstract
Large Language Models (LLMs) are pivotal in natural language processing. The impracticality of full fine-tuning has prompted Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA), optimizing low-rank matrices A and B. In distributed scenarios where privacy constraints necessitate Federated Learning (FL), however, the integration of LoRA is often unstable. Specifically, we identify that aggregating updates from multiple clients introduces statistical variance that scales with the client count, causing gradient collapse when using high-rank adapters. Existing scaling factor candidates, such as the one used by Rank-Stabilized LoRA, ignore the interaction caused by the aggregation process. To bridge this gap, this paper introduces Stabilized Federated LoRA (SFed-LoRA), a framework that theoretically characterizes the interaction between adapter rank and federated aggregation. We derive an optimal scaling factor designed to effectively mitigate the aggregation error accumulating across N clients. By correcting the scaling mismatch inherent in previous approaches, SFed-LoRA restores the efficacy of high-rank adaptation without altering the original model architecture or increasing inference latency. Extensive experiments in diverse tasks, model architectures, and heterogeneous data distributions are conducted to validate our results. We demonstrate that SFed-LoRA prevents high-rank collapse, and achieves significantly improved stability and faster convergence compared with state-of-the-art baselines for high-rank adaptation.