OPLoRA: Orthogonal Projection LoRA Prevents Catastrophic Forgetting during Parameter-Efficient Fine-Tuning

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

LoRA-based efficient fine-tuning often suffers from catastrophic forgetting, primarily because low-rank updates perturb the dominant singular directions of pretrained weights. To address this, we propose OrthoLoRA—a novel method introducing bilateral orthogonal projections onto the left and right singular subspaces (spanned by $U_k$ and $V_k$, respectively). We theoretically prove that OrthoLoRA exactly preserves the top-$k$ singular triplets of the pretrained weight matrix. We further define a subspace interference metric $ ho_k$ to quantify forgetting risk. OrthoLoRA freezes the backbone weights via SVD and enforces projection constraints $P_L = I - U_kU_k^ op$ and $P_R = I - V_kV_k^ op$, enabling parameter-efficient adaptation while guaranteeing knowledge retention. Experiments on LLaMA-2 7B and Qwen2.5 7B demonstrate that OrthoLoRA significantly mitigates forgetting, achieving performance on par with or superior to standard LoRA across commonsense reasoning, mathematical problem solving, and code generation tasks.

Technology Category

Application Category

📝 Abstract

Low-Rank Adaptation (LoRA) enables efficient fine-tuning of large language models but suffers from catastrophic forgetting when learned updates interfere with the dominant singular directions that encode essential pre-trained knowledge. We propose Orthogonal Projection LoRA (OPLoRA), a theoretically grounded approach that prevents this interference through double-sided orthogonal projections. By decomposing frozen weights via SVD, OPLoRA constrains LoRA updates to lie entirely within the orthogonal complement of the top-$k$ singular subspace using projections $P_L = I - U_k U_k^ op$ and $P_R = I - V_k V_k^ op$. We prove that this construction exactly preserves the top-$k$ singular triples, providing mathematical guarantees for knowledge retention. To quantify subspace interference, we introduce $ρ_k$, a metric measuring update alignment with dominant directions. Extensive experiments across commonsense reasoning, mathematics, and code generation demonstrate that OPLoRA significantly reduces forgetting while maintaining competitive task-specific performance on LLaMA-2 7B and Qwen2.5 7B, establishing orthogonal projection as an effective mechanism for knowledge preservation in parameter-efficient fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Preventing catastrophic forgetting in LoRA fine-tuning of language models

Constraining parameter updates to orthogonal subspaces via projections

Preserving essential pre-trained knowledge while enabling task adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal projection prevents catastrophic forgetting

Constrains updates to orthogonal complement via SVD

Preserves top-k singular triples mathematically

🔎 Similar Papers

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning