LoRA vs Full Fine-tuning: An Illusion of Equivalence

📅 2024-10-28
🏛️ arXiv.org
📈 Citations: 17
Influential: 2
📄 PDF
🤖 AI Summary
Despite comparable downstream performance, LoRA and full-parameter fine-tuning exhibit fundamental differences in representation learning—particularly in weight spectral structure and generalization behavior. Method: We conduct singular value spectrum analysis, comparative weight structural characterization, and multi-task sequential adaptation experiments to investigate how LoRA’s low-rank updates interact with pretrained representations. Contribution/Results: We identify, for the first time, “intrusive dimensions”—novel singular vector directions activated by LoRA outside the pretrained distribution—which degrade pretrained knowledge retention and impair cross-distribution generalization and robustness in continual multi-task learning. We show that LoRA and full fine-tuning explore distinct parameter subspaces. High-rank LoRA approximates the spectral characteristics of full fine-tuning, while rank-stabilized LoRA effectively suppresses intrusive dimensions, significantly improving knowledge preservation and transfer robustness. These findings provide both theoretical grounding and practical guidance for designing efficient, robust adaptation methods.

Technology Category

Application Category

📝 Abstract
Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to match the performance of fully fine-tuned models on various tasks with an extreme reduction in the number of trainable parameters. Even in settings where both methods learn similarly accurate models, emph{are their learned solutions really equivalent?} We study how different fine-tuning methods change pre-trained models by analyzing the model's weight matrices through the lens of their spectral properties. We find that full fine-tuning and LoRA yield weight matrices whose singular value decompositions exhibit very different structure; moreover, the fine-tuned models themselves show distinct generalization behaviors when tested outside the adaptation task's distribution. More specifically, we first show that the weight matrices trained with LoRA have new, high-ranking singular vectors, which we call emph{intruder dimensions}. Intruder dimensions do not appear during full fine-tuning. Second, we show that LoRA models with intruder dimensions, despite achieving similar performance to full fine-tuning on the target task, become worse models of the pre-training distribution and adapt less robustly to multiple tasks sequentially. Higher-rank, rank-stabilized LoRA models closely mirror full fine-tuning, even when performing on par with lower-rank LoRA models on the same tasks. These results suggest that models updated with LoRA and full fine-tuning access different parts of parameter space, even when they perform equally on the fine-tuned distribution. We conclude by examining why intruder dimensions appear in LoRA fine-tuned models, why they are undesirable, and how their effects can be minimized.
Problem

Research questions and friction points this paper is trying to address.

Comparing LoRA and full fine-tuning equivalence in LLMs
Analyzing spectral properties of weight matrices post-fine-tuning
Investigating intruder dimensions' impact on forgetting in LoRA
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes spectral properties of weight matrices
Identifies intruder dimensions in LoRA
Scales intruder dimensions to reduce forgetting
🔎 Similar Papers
No similar papers found.