🤖 AI Summary
To address catastrophic forgetting in continual learning of language models—caused by LoRA’s inherent constraint that enforces equal contribution from old and new adaptation branches to prior tasks—this paper proposes GainLoRA, a gated ensemble mechanism. GainLoRA introduces a dedicated LoRA branch per new task and incorporates a learnable gating module that dynamically suppresses the output contribution of each new branch on previously seen tasks, thereby enabling task-discriminative routing of LoRA parameters for the first time. Crucially, it requires no access to historical task data or replay mechanisms and remains fully compatible with parameter-efficient fine-tuning. Evaluated on standard continual learning benchmarks, GainLoRA achieves a 3.2% absolute improvement in average accuracy and reduces forgetting by 41% over current state-of-the-art methods. It effectively mitigates inter-task interference while preserving model lightweightness, thereby enhancing both stability and generalization.
📝 Abstract
Continual learning (CL), which requires the model to learn multiple tasks sequentially, is crucial for language models (LMs). Recently, low-rank adaptation (LoRA), one of the most representative parameter-efficient fine-tuning (PEFT) methods, has gained increasing attention in CL of LMs. However, most existing CL methods based on LoRA typically expand a new LoRA branch to learn each new task and force the new and old LoRA branches to contribute equally to old tasks, potentially leading to forgetting. In this work, we propose a new method, called gated integration of low-rank adaptation (GainLoRA), for CL of LMs. GainLoRA expands a new LoRA branch for each new task and introduces gating modules to integrate the new and old LoRA branches. Furthermore, GainLoRA leverages the new gating module to minimize the contribution from the new LoRA branch to old tasks, effectively mitigating forgetting and improving the model's overall performance. Experimental results on CL benchmarks demonstrate that GainLoRA outperforms existing state-of-the-art methods.