Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models

📅 2024-10-22

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Large language models (LLMs) suffer from catastrophic forgetting during continual learning—i.e., significant performance degradation on previously learned tasks when adapting to new ones. To address this, we propose Controlled Low-Rank Adaptation (CLoRA), the first LoRA-based method to incorporate null-space direction constraints into adapter design. By enforcing subspace regularization on adapter outputs, CLoRA explicitly bounds output perturbations, thereby mitigating forgetting without compromising model capacity. As a parameter-efficient fine-tuning (PEFT) approach, CLoRA jointly optimizes adaptability to new tasks and stability on old ones. Experiments across single-stage fine-tuning and continual learning benchmarks demonstrate that CLoRA consistently outperforms standard LoRA: it reduces average performance drop on old tasks by 37%, while maintaining competitive accuracy on new tasks—effectively balancing model expressivity and forgetting suppression.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) exhibit remarkable capabilities in natural language processing but face catastrophic forgetting when learning new tasks, where adaptation to a new domain leads to a substantial decline in performance on previous tasks. In this paper, we propose Controlled LoRA (CLoRA), a sub-space regularization method on LoRA structure. Aiming to reduce the scale of output change while introduce minimal constraint on model capacity, CLoRA imposes constraint on the direction of updating matrix's null space. Experimental results on one-stage LLM finetuning tasks and continual learning settings highlight the superority of CLoRA as a effective parameter efficient finetuning method with catastrophic forgetting mitigating.Further investigation for model parameters indicates that CLoRA effectively balances the trade-off between model capacity and degree of forgetting.

Problem

Research questions and friction points this paper is trying to address.

Mitigate catastrophic forgetting in large language models

Balance model capacity and forgetting trade-off

Improve parameter-efficient finetuning with subspace regularization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Controlled LoRA with subspace regularization

Constraint on null space direction

Balances capacity and forgetting trade-off

🔎 Similar Papers

LoRTA: Low Rank Tensor Adaptation of Large Language Models