SuRe: Surprise-Driven Prioritised Replay for Continual LLM Learning

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In continual learning, large language models (LLMs) suffer from catastrophic forgetting and inefficient rehearsal, especially over long task sequences where conventional regularization and rehearsal methods fail. To address this, we propose SuRe: a novel framework featuring (i) an “astonishment”-based prioritized rehearsal mechanism—defined as the negative log-likelihood—to dynamically select the most informative samples for replay; and (ii) a fast-slow dual-LoRA adapter architecture with exponential moving average (EMA) to enable progressive fusion of new and old knowledge. SuRe achieves high knowledge consolidation efficiency and rapid task adaptation while operating under stringent resource constraints—i.e., a small replay buffer (≤100 samples) and sparse rehearsal frequency. Evaluated on standard and large-scale continual learning benchmarks, SuRe establishes new state-of-the-art performance: in the LNT setting, it improves accuracy by +5.0 percentage points over prior best methods, demonstrating strong robustness and scalability.

Technology Category

Application Category

📝 Abstract
Continual learning, one's ability to adapt to a sequence of tasks without forgetting previously acquired knowledge, remains a major challenge in machine learning and a key gap between artificial and human intelligence. While regularisation and replay perform well in vision, they lag behind multi-task learning for large language models (LLMs), especially at scale with many tasks. We revisit replay and argue that two failure modes drive this gap: selection (what to rehearse) and integration (how to consolidate new knowledge). To address selection, we propose Surprise-prioritised Replay (SuRe), a simple, architecture-agnostic rule that ranks and stores the most surprising (high Negative Log-Likelihood) sequences. SuRe achieves state-of-the-art performance in the Large Number of Tasks (LNT) setting and delivers the best overall average across both Standard CL and LNT benchmarks. To address integration, we add a dual-learner design with fast and slow LoRA adapters merged via an exponential moving average (EMA), enabling rapid adaptation while stabilising long-term knowledge. Combining SuRe with the dual learner yields further gains, including improvements of up to +5 accuracy points on LNT over prior SOTA. Ablation studies confirm that our proposed method remains robust under reduced replay frequency and small buffer size, demonstrating both effectiveness and sample efficiency. Taken together, our results establish replay as a strong baseline for continual LLM fine-tuning and demonstrate that surprise-based selection and slow-weight consolidation are complementary components for mitigating catastrophic forgetting.
Problem

Research questions and friction points this paper is trying to address.

Addresses catastrophic forgetting in continual learning for large language models.
Improves replay selection using surprise-driven prioritization of sequences.
Enhances knowledge integration with dual-learner design and slow-weight consolidation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Surprise-prioritized replay selects high-NLL sequences for storage
Dual-learner design uses fast and slow LoRA adapters with EMA
Combined approach improves accuracy and mitigates catastrophic forgetting
🔎 Similar Papers
No similar papers found.
H
Hugo Hazard
AI Centre, Department of Computer Science, University College London, London, UK
Zafeirios Fountas
Zafeirios Fountas
Principal Research Scientist, Huawei Technologies, London
Artificial intelligenceTheoretical neuroscienceMachine learningMemoryTime perception
M
Martin A. Benfeghoul
Huawei Noah’s Ark Lab, London, UK
A
Adnan Oomerjee
Huawei Noah’s Ark Lab, London, UK
J
Jun Wang
AI Centre, Department of Computer Science, University College London, London, UK
Haitham Bou-Ammar
Haitham Bou-Ammar
RL-Team Leader, BO-Team Leader, MAS-Team Leader Huawei Noah's Ark Lab, H. Assistant Professor @ UCL
Machine LearningReinforcement LearningOptimisationVariational Inference