Mitigating Forgetting in Low Rank Adaptation

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address catastrophic forgetting in LoRA-based fine-tuning, this paper proposes LaLoRA—the first method to introduce the Laplace approximation into the LoRA weight space, enabling lightweight weight-space regularization. By constraining parameter updates along high-curvature directions, LaLoRA preserves pretraining knowledge while enhancing downstream task performance. Crucially, regularization operates solely on the low-rank incremental matrices, incurring no inference overhead and supporting tunable trade-offs between learning and forgetting. Its core innovation lies in modeling LoRA parameter confidence via loss curvature estimation, unifying parameter efficiency, knowledge stability, and robustness. In mathematical reasoning fine-tuning experiments on Llama, LaLoRA significantly improves the forgetting–performance trade-off: regularization strength directly controls forgetting extent, and the method demonstrates strong robustness to data sampling variations and hyperparameter choices.

Technology Category

Application Category

📝 Abstract

Parameter-efficient fine-tuning methods, such as Low-Rank Adaptation (LoRA), enable fast specialization of large pre-trained models to different downstream applications. However, this process often leads to catastrophic forgetting of the model's prior domain knowledge. We address this issue with LaLoRA, a weight-space regularization technique that applies a Laplace approximation to Low-Rank Adaptation. Our approach estimates the model's confidence in each parameter and constrains updates in high-curvature directions, preserving prior knowledge while enabling efficient target-domain learning. By applying the Laplace approximation only to the LoRA weights, the method remains lightweight. We evaluate LaLoRA by fine-tuning a Llama model for mathematical reasoning and demonstrate an improved learning-forgetting trade-off, which can be directly controlled via the method's regularization strength. We further explore different loss landscape curvature approximations for estimating parameter confidence, analyze the effect of the data used for the Laplace approximation, and study robustness across hyperparameters.

Problem

Research questions and friction points this paper is trying to address.

Mitigates catastrophic forgetting in LoRA fine-tuning

Preserves prior knowledge while enabling efficient learning

Controls learning-forgetting trade-off via regularization strength

Innovation

Methods, ideas, or system contributions that make the work stand out.

Laplace approximation applied to LoRA weights

Regularization constrains high-curvature parameter updates

Lightweight method preserves prior knowledge during fine-tuning

🔎 Similar Papers

An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates