🤖 AI Summary
This work addresses a critical yet previously overlooked issue in federated LoRA training: the rotational inconsistency of low-rank factors across clients, which induces aggregation errors and training instability. We identify and formally characterize this “rotation misalignment” problem and propose a novel solution that applies orthogonal transformations to client-side LoRA updates prior to aggregation, aligning their underlying subspaces to prevent semantic distortion caused by naive averaging. Our approach incurs no additional communication overhead and preserves full model expressivity, supported by a rigorous theoretical convergence analysis. Extensive experiments demonstrate that the method consistently outperforms existing federated LoRA strategies across diverse natural language understanding and generation tasks, exhibiting robust performance under varying degrees of data heterogeneity and LoRA rank configurations.
📝 Abstract
Federated LoRA provides a communication-efficient mechanism for fine-tuning large language models on decentralized data. In practice, however, a discrepancy between the factor-wise averaging used to preserve low rank and the mathematically correct aggregation of local updates can cause significant aggregation error and unstable training. We argue that a major source of this problem is rotational misalignment, arising from the rotational invariance of low-rank factorizations -- semantically equivalent updates can be represented in different latent subspaces across clients since $(B_i R_i)(R_i^\top A_i) = B_i A_i$. When such misaligned factors are averaged directly, they interfere destructively and degrade the global update. To address this issue, we propose FedRot-LoRA, a federated LoRA framework that aligns client updates via orthogonal transformations prior to aggregation. This alignment preserves the semantic update while reducing cross-client subspace mismatch, without increasing communication cost or restricting model expressivity. We provide a convergence analysis that examines the aggregation error induced by factor-wise averaging and shows how rotational alignment yields a tighter upper bound on this error. Extensive experiments on natural language understanding and generative tasks demonstrate that FedRot-LoRA consistently outperforms existing federated LoRA baselines across a range of heterogeneity levels and LoRA ranks.