Continual Fine-Tuning of Large Language Models via Program Memory

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenge in continual learning where existing low-rank adaptation (LoRA) methods struggle to balance rapid adaptation to new tasks with the retention of previously acquired knowledge, often leading to catastrophic forgetting. To overcome this limitation, we propose ProCL, a novel framework that, for the first time, integrates the complementary learning systems theory from neuroscience into continual LoRA. ProCL introduces structured procedural memory slots and employs input-conditional attention to dynamically retrieve task-specific adapters, thereby enabling synergistic local adaptation and global knowledge accumulation. This design allows semantically similar inputs to share adapter regions, preserving future learning capacity while jointly optimizing model plasticity and stability. Extensive experiments demonstrate that ProCL significantly outperforms current continual LoRA approaches across multiple benchmarks, effectively enhancing knowledge retention and mitigating catastrophic forgetting.

📝 Abstract

Parameter-Efficient Fine-Tuning (PEFT), particularly Low-Rank Adaptation (LoRA), has become a standard approach for adapting Large Language Models (LLMs) under limited compute. However, in continual settings where models are updated sequentially with small datasets, conventional LoRA updates struggle to balance rapid adaptation and knowledge retention. Existing methods typically treat the low-rank space as a homogeneous update region, lacking mechanisms to regulate how short-term updates are consolidated over time. We propose a continual LoRA framework with \textbf{Pro}gram memory, inspired by \textbf{C}omplementary \textbf{L}earning Systems in neuroscience. Our approach, dubbed \textbf{ProCL}, organizes LoRA adapters into structured program memory slots that are dynamically retrieved through input-conditioned attention. This enables rapid and localized adaptation, encouraging similar inputs to reuse shared adapter regions while reserving unused capacity for future data. The slots are then combined with the underlying adapter, which maintains a distributed representation that gradually accumulates knowledge across tasks to balance plasticity and stability. Our method operates entirely within the LoRA parameterization and incurs no additional inference cost. Experiments on diverse benchmarks demonstrate improved retention and reduced catastrophic forgetting over other continual LoRA strategies.

Problem

Research questions and friction points this paper is trying to address.

Continual Learning

Large Language Models

Parameter-Efficient Fine-Tuning

Catastrophic Forgetting

LoRA

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual Learning

LoRA

Program Memory