Replay Can Provably Increase Forgetting

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Sample replay is widely employed in continual learning to mitigate catastrophic forgetting of old tasks, yet its efficacy remains poorly understood. Method: We theoretically analyze replay in an idealized, noiseless, overparameterized linear regression setting and conduct empirical validation via SGD-trained neural networks on standard benchmarks. Contribution/Results: We establish, for the first time, that replay can *worsen* both worst-case and expected forgetting—demonstrating non-monotonic and even detrimental effects. The core mechanism is a coupling between task subspace geometry and replay sample selection: when replayed samples deviate from the principal directions of old-task subspaces, they amplify parameter drift and induce negative transfer. Our theoretical forgetting bounds and extensive experiments consistently reproduce and confirm this phenomenon. This work challenges the assumption that replay is universally beneficial, revealing its effectiveness to be critically contingent on task geometry and replay strategy—providing essential theoretical guidance and caution for designing robust replay mechanisms in continual learning.

Technology Category

Application Category

📝 Abstract

Continual learning seeks to enable machine learning systems to solve an increasing corpus of tasks sequentially. A critical challenge for continual learning is forgetting, where the performance on previously learned tasks decreases as new tasks are introduced. One of the commonly used techniques to mitigate forgetting, sample replay, has been shown empirically to reduce forgetting by retaining some examples from old tasks and including them in new training episodes. In this work, we provide a theoretical analysis of sample replay in an over-parameterized continual linear regression setting, where each task is given by a linear subspace and with enough replay samples, one would be able to eliminate forgetting. Our analysis focuses on sample replay and highlights the role of the replayed samples and the relationship between task subspaces. Surprisingly, we find that, even in a noiseless setting, forgetting can be non-monotonic with respect to the number of replay samples. We present tasks where replay can be harmful with respect to worst-case settings, and also in distributional settings where replay of randomly selected samples increases forgetting in expectation. We also give empirical evidence that harmful replay is not limited to training with linear models by showing similar behavior for a neural networks equipped with SGD. Through experiments on a commonly used benchmark, we provide additional evidence that, even in seemingly benign scenarios, performance of the replay heavily depends on the choice of replay samples and the relationship between tasks.

Problem

Research questions and friction points this paper is trying to address.

Analyzes sample replay's impact on forgetting in continual learning

Identifies non-monotonic forgetting despite sufficient replay samples

Demonstrates harmful replay scenarios in linear and neural models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sample replay reduces forgetting theoretically

Replay samples' choice affects performance critically

Non-monotonic forgetting occurs in noiseless settings

🔎 Similar Papers

Forgetting Order of Continual Learning: Examples That are Learned First are Forgotten Last