🤖 AI Summary
In differentially private federated learning (DP-FL), LoRA fine-tuning suffers from quadratic noise amplification due to DP-SGD’s per-sample gradient perturbation followed by BA matrix multiplication; freezing the A matrix mitigates noise but impairs representational capacity. To address this, we propose an SVD-based global reparameterization mechanism: clients upload only the low-rank matrix B, while the server dynamically reconstructs an orthogonal A matrix via adaptive singular value decomposition (SVD). This is the first work to integrate SVD reparameterization into private federated LoRA training. Our method preserves LoRA’s full expressivity while structurally suppressing noise amplification and provides a theoretical gradient norm bound. Experiments across multiple privacy budgets and benchmark tasks demonstrate significant improvements in model stability and accuracy, consistently outperforming baselines including frozen-A LoRA and standard LoRA.
📝 Abstract
Low-Rank Adaptation (LoRA), which introduces a product of two trainable low-rank matrices into frozen pre-trained weights, is widely used for efficient fine-tuning of language models in federated learning (FL). However, when combined with differentially private stochastic gradient descent (DP-SGD), LoRA faces substantial noise amplification: DP-SGD perturbs per-sample gradients, and the matrix multiplication of the LoRA update ($BA$) intensifies this effect. Freezing one matrix (e.g., $A$) reduces the noise but restricts model expressiveness, often resulting in suboptimal adaptation. To address this, we propose FedSVD, a simple yet effective method that introduces a global reparameterization based on singular value decomposition (SVD). In our approach, each client optimizes only the $B$ matrix and transmits it to the server. The server aggregates the $B$ matrices, computes the product $BA$ using the previous $A$, and refactorizes the result via SVD. This yields a new adaptive $A$ composed of the orthonormal right singular vectors of $BA$, and an updated $B$ containing the remaining SVD components. This reparameterization avoids quadratic noise amplification, while allowing $A$ to better capture the principal directions of the aggregate updates. Moreover, the orthonormal structure of $A$ bounds the gradient norms of $B$ and preserves more signal under DP-SGD, as confirmed by our theoretical analysis. As a result, FedSVD consistently improves stability and performance across a variety of privacy settings and benchmarks, outperforming relevant baselines under both private and non-private regimes.