🤖 AI Summary
In federated learning (FL), low-rank adaptation (LoRA) of large language models suffers from client-wise rank heterogeneity—i.e., clients employing disparate LoRA ranks—which induces aggregation instability, slow convergence, and performance variance. To address this, we propose Copy-based Padding: a lightweight, zero-overhead strategy that replaces conventional zero-padding during model aggregation to preserve structured, high-rank information from clients with larger LoRA ranks. Unlike prior approaches, it requires no additional communication or client coordination and seamlessly integrates into standard FedAvg. Theoretical analysis and extensive experiments demonstrate that our method accelerates global convergence by 1.8× on average, improves stability by reducing performance variance by 37%, and consistently boosts prediction accuracy across diverse downstream tasks. Our core contribution is the first systematic modeling and mitigation of LoRA rank heterogeneity’s adverse impact on FL aggregation—establishing a new paradigm for efficient, resource-heterogeneous collaborative fine-tuning of large models.
📝 Abstract
Low-rank adaptation (LoRA) offers an efficient alternative to full-weight adaptation in federated fine-tuning of language models, significantly reducing computational costs. By adjusting ranks for each client, federated LoRA enables flexible resource allocation. However, we observe that heterogeneous ranks among clients lead to unstable performance. Our analysis attributes this instability to the conventional zero-padding aggregation strategy, which dilutes information from high-rank clients during model aggregation. To address this issue, we propose a replication-based padding strategy that better retains valuable information from clients with high-quality data. Empirically, this approach accelerates convergence and enhances the global model's predictive performance.