Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning

📅 2024-11-29

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This paper addresses the challenge that low-rank adaptation (LoRA) often fails to match the performance of full-parameter fine-tuning. To this end, we propose LoRA-SB—a theoretically grounded, parameter-efficient fine-tuning method. Its three key contributions are: (1) the first theoretical proof that the optimal low-rank gradient approximation is achievable via structured LoRA-XS; (2) a direction-preserving initialization strategy ensuring consistent gradient update directions throughout training; and (3) hyperparameter-free optimal scaling of high-rank gradients. Without any hyperparameter tuning, LoRA-SB consistently outperforms standard LoRA and LoRA-XS across mathematical reasoning, commonsense reasoning, and language understanding tasks. Notably, it achieves superior performance while requiring only 1/27–1/90 of the trainable parameters of LoRA-XS, significantly enhancing both efficiency and generalization of low-rank adapters.

Technology Category

Application Category

📝 Abstract

Low-rank adapters have become standard for efficiently fine-tuning large language models (LLMs), but they often fall short of achieving the performance of full fine-tuning. We propose a method, LoRA Silver Bullet or LoRA-SB, that approximates full fine-tuning within low-rank subspaces using a carefully designed initialization strategy. We theoretically demonstrate that the architecture of LoRA-XS, which inserts a learnable (r x r) matrix between B and A while keeping other matrices fixed, provides the precise conditions needed for this approximation. We leverage its constrained update space to achieve optimal scaling for high-rank gradient updates while removing the need for hyperparameter tuning. We prove that our initialization offers an optimal low-rank approximation of the initial gradient and preserves update directions throughout training. Extensive experiments across mathematical reasoning, commonsense reasoning, and language understanding tasks demonstrate that our approach exceeds the performance of standard LoRA while using extbf{27-90} times fewer learnable parameters, and comprehensively outperforms LoRA-XS. Our findings establish that it is possible to simulate full fine-tuning in low-rank subspaces, and achieve significant efficiency gains without sacrificing performance. Our code is publicly available at https://github.com/RaghavSinghal10/lora-sb.

Problem

Research questions and friction points this paper is trying to address.

Efficient low-rank fine-tuning of LLMs

Achieving full fine-tuning performance with fewer parameters

Optimal scaling for high-rank gradient updates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Initialization strategy optimizes low-rank adapters

LoRA-SB simulates full fine-tuning efficiently

Reduces parameters 27-90 times without performance loss

🔎 Similar Papers

Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape