Stabilized Fine-Tuning with LoRA in Federated Learning: Mitigating the Side Effect of Client Size and Rank via the Scaling Factor

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the gradient collapse issue in high-rank LoRA during federated fine-tuning of large language models, which arises from amplified statistical variance due to multi-client aggregation. Existing scaling strategies overlook the interaction between federated aggregation and adapter rank. To resolve this, we propose SFed-LoRA, a novel framework that theoretically characterizes the impact of federated aggregation on LoRA rank for the first time and derives an optimal scaling factor dependent on both the number of clients and the adapter rank. This scaling effectively corrects aggregation-induced errors and stabilizes training without modifying model architecture or increasing inference overhead. Extensive experiments demonstrate that SFed-LoRA significantly improves convergence speed and stability of high-rank LoRA across diverse tasks, models, and heterogeneous data settings, outperforming current state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are pivotal in natural language processing. The impracticality of full fine-tuning has prompted Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA), optimizing low-rank matrices A and B. In distributed scenarios where privacy constraints necessitate Federated Learning (FL), however, the integration of LoRA is often unstable. Specifically, we identify that aggregating updates from multiple clients introduces statistical variance that scales with the client count, causing gradient collapse when using high-rank adapters. Existing scaling factor candidates, such as the one used by Rank-Stabilized LoRA, ignore the interaction caused by the aggregation process. To bridge this gap, this paper introduces Stabilized Federated LoRA (SFed-LoRA), a framework that theoretically characterizes the interaction between adapter rank and federated aggregation. We derive an optimal scaling factor designed to effectively mitigate the aggregation error accumulating across N clients. By correcting the scaling mismatch inherent in previous approaches, SFed-LoRA restores the efficacy of high-rank adaptation without altering the original model architecture or increasing inference latency. Extensive experiments in diverse tasks, model architectures, and heterogeneous data distributions are conducted to validate our results. We demonstrate that SFed-LoRA prevents high-rank collapse, and achieves significantly improved stability and faster convergence compared with state-of-the-art baselines for high-rank adaptation.
Problem

Research questions and friction points this paper is trying to address.

Federated Learning
Low-Rank Adaptation
Parameter-Efficient Fine-Tuning
Gradient Collapse
Client Heterogeneity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning
LoRA
Parameter-Efficient Fine-Tuning
Scaling Factor
High-Rank Adaptation
🔎 Similar Papers
No similar papers found.
J
Jiayu Huang
Beijing University of Posts and Telecommunications, Beijing, China
X
Xiaohu Wu
Beijing University of Posts and Telecommunications, Beijing, China
Tiantian He
Tiantian He
PhD student, University College London
AI AgentProbabilistic modellingGraph learningSpatio-temporal modellingAI for Neuroscience
Q
Qicheng Lao
Beijing University of Posts and Telecommunications, Beijing, China