Stabilized Fine-Tuning with LoRA in Federated Learning: Mitigating the Side Effect of Client Size and Rank via the Scaling Factor

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the gradient collapse issue in high-rank LoRA during federated fine-tuning of large language models, which arises from amplified statistical variance due to multi-client aggregation. Existing scaling strategies overlook the interaction between federated aggregation and adapter rank. To resolve this, we propose SFed-LoRA, a novel framework that theoretically characterizes the impact of federated aggregation on LoRA rank for the first time and derives an optimal scaling factor dependent on both the number of clients and the adapter rank. This scaling effectively corrects aggregation-induced errors and stabilizes training without modifying model architecture or increasing inference overhead. Extensive experiments demonstrate that SFed-LoRA significantly improves convergence speed and stability of high-rank LoRA across diverse tasks, models, and heterogeneous data settings, outperforming current state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are pivotal in natural language processing. The impracticality of full fine-tuning has prompted Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA), optimizing low-rank matrices A and B. In distributed scenarios where privacy constraints necessitate Federated Learning (FL), however, the integration of LoRA is often unstable. Specifically, we identify that aggregating updates from multiple clients introduces statistical variance that scales with the client count, causing gradient collapse when using high-rank adapters. Existing scaling factor candidates, such as the one used by Rank-Stabilized LoRA, ignore the interaction caused by the aggregation process. To bridge this gap, this paper introduces Stabilized Federated LoRA (SFed-LoRA), a framework that theoretically characterizes the interaction between adapter rank and federated aggregation. We derive an optimal scaling factor designed to effectively mitigate the aggregation error accumulating across N clients. By correcting the scaling mismatch inherent in previous approaches, SFed-LoRA restores the efficacy of high-rank adaptation without altering the original model architecture or increasing inference latency. Extensive experiments in diverse tasks, model architectures, and heterogeneous data distributions are conducted to validate our results. We demonstrate that SFed-LoRA prevents high-rank collapse, and achieves significantly improved stability and faster convergence compared with state-of-the-art baselines for high-rank adaptation.

Problem

Research questions and friction points this paper is trying to address.

Federated Learning

Low-Rank Adaptation

Parameter-Efficient Fine-Tuning

Gradient Collapse

Client Heterogeneity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

LoRA

Parameter-Efficient Fine-Tuning

Scaling Factor

High-Rank Adaptation

🔎 Similar Papers

SA-FedLora: Adaptive Parameter Allocation for Efficient Federated Learning with LoRA Tuning

2024-05-15arXiv.orgCitations: 6

💼 Related Jobs

Research Engineer, Monetization AI