Merge before Forget: A Single LoRA Continual Learning via Continual Merging

📅 2025-12-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing parameter-efficient continual learning (PECL) methods for large language models (LLMs) suffer from memory bloat, storage overhead, and task interference due to the absence of LoRA module merging during training. Method: We propose a single-LoRA dynamic continual learning framework that eliminates redundant adapters via on-the-fly merging. Our approach introduces three key innovations: (1) a novel “merge-before-forget” paradigm enabling incremental knowledge integration within a single-LoRA architecture; (2) orthogonal basis initialization to ensure subspace decoupling across tasks; and (3) a time-aware asymmetric LoRA scaling mechanism that adaptively balances contributions from new and old tasks. Theoretically grounded continual merging guarantees constant O(1) memory complexity. Results: Evaluated on multiple continual learning benchmarks, our method outperforms existing LoRA-based PECL approaches—achieving average accuracy gains of 3.2–5.7% on Llama-family models—while effectively mitigating catastrophic forgetting and task interference.

Technology Category

Application Category

📝 Abstract
Parameter-efficient continual learning has emerged as a promising approach for large language models (LLMs) to mitigate catastrophic forgetting while enabling adaptation to new tasks. Current Low-Rank Adaptation (LoRA) continual learning techniques often retain and freeze previously learned LoRAs or generate data representations to overcome forgetting, typically utilizing these to support new LoRAs learn new tasks. However, these methods not only ignore growing computational memory with tasks and limited storage space but also suffer from potential task interference due to the lack of effective LoRA merging mechanisms. In this paper, we propose a novel continual learning method that orthogonally initializes and sequentially merges LoRAs updates into a single unified LoRA. Our method leverages orthogonal basis extraction from previously learned LoRA to initialize the learning of new tasks, further exploits the intrinsic asymmetry property of LoRA components by using a time-aware scaling mechanism to balance new and old knowledge during continual merging. Our approach maintains constant memory complexity with respect to the number of tasks, minimizes interference between past and new tasks via orthogonal basis initialization, and improves performance over asymmetric LoRA merging via adaptive scaling. We provide theoretical analysis to justify our design and conduct extensive experiments across diverse continual learning benchmarks using various Llama models, demonstrating the effectiveness and efficiency of our method.
Problem

Research questions and friction points this paper is trying to address.

Mitigates catastrophic forgetting in LLMs during continual learning
Reduces memory growth and task interference in LoRA methods
Merges multiple task adapters into single unified parameter set
Innovation

Methods, ideas, or system contributions that make the work stand out.

Single unified LoRA via continual merging
Orthogonal basis extraction for task initialization
Time-aware scaling for knowledge balance
🔎 Similar Papers
No similar papers found.