One-step full gradient suffices for low-rank fine-tuning, provably and efficiently

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

To address LoRA’s slow convergence and poor alignment efficiency in complex models, this paper proposes LoRA-One: a method that achieves precise alignment of critical singular subspaces via a single full-gradient update coupled with spectral initialization. Theoretically, we establish the first rigorous proof that these two preprocessing steps strictly substitute conventional multi-step LoRA optimization, jointly guaranteeing subspace alignment, linear convergence, and bounded generalization error; we further reveal the intrinsic acceleration mechanism of preconditioning in high-rank adaptation. Methodologically, LoRA-One integrates spectral initialization, gradient preconditioning, matrix sensing analysis, and decoupled learning dynamics. Experiments demonstrate that LoRA-One significantly outperforms standard LoRA and leading variants across multiple benchmarks. Notably, our theoretical analysis shows that the spectral initialization alone already possesses inherent feature-learning capability. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

This paper studies how to improve the performance of Low-Rank Adaption (LoRA) as guided by our theoretical analysis. Our first set of theoretical results show that for random initialization and linear models, extit{i)} LoRA will align to the certain singular subspace of one-step gradient of full fine-tuning; extit{ii)} preconditioners improve convergence in the high-rank case. These insights motivate us to focus on preconditioned LoRA using a specific spectral initialization strategy for aligning with certain subspaces. For both linear and nonlinear models, we prove that alignment and generalization guarantees can be directly achieved at initialization, and the subsequent linear convergence can be also built. Our analysis leads to the emph{LoRA-One} algorithm (using emph{One}-step gradient and preconditioning), a theoretically grounded algorithm that achieves significant empirical improvement over vanilla LoRA and its variants on several benchmarks. Our theoretical analysis, based on decoupling the learning dynamics and characterizing how spectral initialization contributes to feature learning, may be of independent interest for understanding matrix sensing and deep learning theory. The source code can be found in the https://github.com/YuanheZ/LoRA-One.

Problem

Research questions and friction points this paper is trying to address.

Low-Rank Adaptation

Model Efficiency

Learning Acceleration

Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA-One

Target Space Alignment

Learning Acceleration

🔎 Similar Papers

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?