🤖 AI Summary
This work addresses catastrophic forgetting in large language models during continual learning, a challenge exacerbated by existing replay methods that rely on fixed-interval strategies ill-suited to the model’s dynamic evolution. To overcome this limitation, the authors propose FOREVER, a novel framework that, for the first time, integrates the Ebbinghaus forgetting curve into continual learning for large language models. FOREVER defines “model time” based on optimizer update magnitudes to dynamically schedule both the timing and intensity of memory replay. This adaptive mechanism is further enhanced by an intensity-aware regularization strategy, aligning memory updates with the model’s internal learning dynamics. Evaluated across three benchmarks on models ranging from 0.6B to 13B parameters, FOREVER consistently and significantly mitigates forgetting, outperforming current state-of-the-art replay-based approaches.
📝 Abstract
Continual learning (CL) for large language models (LLMs) aims to enable sequential knowledge acquisition without catastrophic forgetting. Memory replay methods are widely used for their practicality and effectiveness, but most rely on fixed, step-based heuristics that often misalign with the model's actual learning progress, since identical training steps can result in varying degrees of parameter change. Motivated by recent findings that LLM forgetting mirrors the Ebbinghaus human forgetting curve, we propose FOREVER (FORgEtting curVe-inspired mEmory Replay), a novel CL framework that aligns replay schedules with a model-centric notion of time. FOREVER defines model time using the magnitude of optimizer updates, allowing forgetting curve-inspired replay intervals to align with the model's internal evolution rather than raw training steps. Building on this approach, FOREVER incorporates a forgetting curve-based replay scheduler to determine when to replay and an intensity-aware regularization mechanism to adaptively control how to replay. Extensive experiments on three CL benchmarks and models ranging from 0.6B to 13B parameters demonstrate that FOREVER consistently mitigates catastrophic forgetting.