🤖 AI Summary
FedLoRA faces two key bottlenecks in federated fine-tuning: (1) large local-global generalization gaps due to imprecise client updates, and (2) high communication overhead. To address these, we propose FLoRA-NA, which introduces a proxy aggregation matrix on the server. Leveraging only the low-rank LoRA parameters uploaded by clients, FLoRA-NA performs efficient and accurate aggregation via gradient approximation and matrix reconstruction—without requiring additional communication. This enables the server to closely approximate the global update, substantially narrowing the performance gap between personalized local models and the generalized global model. Extensive experiments across diverse tasks—including natural language understanding, mathematical reasoning, and code generation—and multiple foundation models demonstrate that FLoRA-NA consistently achieves state-of-the-art global performance while incurring minimal communication cost (transmitting LoRA weights only), thereby offering both computational efficiency and strong generalization capability.
📝 Abstract
With the rapid emergence of foundation models and the increasing need for fine-tuning across distributed environments, Federated Low-Rank Adaptation (FedLoRA) has recently gained significant attention. Despite enormous potential, current FedLoRA methods face notable challenges due to inexact updates. Existing approaches have attempted to mitigate this issue, but they often introduce a emph{local-global generalization gap} and incur emph{substantial communication overhead}, limiting their scalability and effectiveness. To address these limitations, we propose extbf{F}ederated extbf{Lo}w- extbf{R}ank extbf{A}ggregation with extbf{N}early extbf{A}ccurate Estimation (FLoRA-NA). FLoRA-NA leverages the local LoRA matrices on the server to estimate the aggregated matrices $hat{A}$ and $hat{B}$, which are then distributed to clients for local updates. This surrogated aggregated matrices minimizes the divergence between ideal $
abla Bar{W} = sum^{U}_{u=1}B_u A_u$ and practical updates $
abla hat{W} = hat{B}hat{A}$ without adding communication cost beyond vanilla FedLoRA. By doing so, FLoRA-NA achieves communication efficiency and bridges the gap between local personalization and global generalization, addressing a key limitation of prior personalized FedLoRA approaches. We conduct extensive evaluations across diverse tasks, including natural language understanding, mathematical reasoning, and code-solving ability using various foundation models. Experimental results consistently demonstrate that FLoRA-NA achieves state-of-the-art global performance while maintaining low communication overhead.