Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning

๐Ÿ“… 2025-06-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In federated learning (FL), fine-tuning large language models (LLMs) faces challenges including privacy preservation and degraded LoRA performance due to client data and computational heterogeneity. To address these, this paper proposes Multi-Head Low-Rank Adaptation (MH-LoRA). Methodologically, MH-LoRA introduces (1) a learnable scaling-factor-based multi-head LoRA architecture that approximates high-rank updates without increasing communication overhead, and (2) the first decoupling of LoRA core matrix and scaling parameter optimization paths in FL fine-tuningโ€”enhancing convergence stability across heterogeneous devices. Evaluated on multimodal benchmarks, MH-LoRA achieves average test accuracy gains of 2โ€“8% over state-of-the-art parameter-efficient methods. It significantly improves robustness, representational capacity, and scalability of federated LLM fine-tuning while preserving privacy and accommodating system heterogeneity.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models (LLMs) have not yet effectively leveraged the vast amounts of edge-device data, and federated learning (FL) offers a promising paradigm to collaboratively fine-tune LLMs without transferring private edge data to the cloud. To operate within the computation and communication constraints of edge devices, recent literature on federated fine-tuning of LLMs proposes the use of low-rank adaptation (LoRA) and similar parameter-efficient methods. However, LoRA-based methods suffer from accuracy degradation in FL settings, primarily because of data and computational heterogeneity across clients. We propose extsc{Ravan}, an adaptive multi-head LoRA method that balances parameter efficiency and model expressivity by reparameterizing the weight updates as the sum of multiple LoRA heads $s_i extbf{B}_i extbf{H}_i extbf{A}_i$ in which only the core matrices $ extbf{H}_i$ and their lightweight scaling factors $s_i$ are trained. These trainable scaling factors let the optimization focus on the most useful heads, recovering a higher-rank approximation of the full update without increasing the number of communicated parameters since clients upload $s_i extbf{H}_i$ directly. Experiments on vision and language benchmarks show that extsc{Ravan} improves test accuracy by 2-8% over prior parameter-efficient baselines, making it a robust and scalable solution for federated fine-tuning of LLMs.
Problem

Research questions and friction points this paper is trying to address.

Federated fine-tuning of LLMs with edge-device data constraints
Accuracy degradation in LoRA-based FL due to heterogeneity
Balancing parameter efficiency and model expressivity adaptively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-head LoRA for federated fine-tuning
Adaptive scaling factors optimize head utility
Reparameterized weight updates enhance model expressivity
๐Ÿ”Ž Similar Papers
No similar papers found.