🤖 AI Summary
This work addresses the persistent challenge of catastrophic forgetting in continual learning, which can occur even when full-data replay is employed, despite the lack of a clear theoretical understanding. Building upon a feature learning perspective, the authors develop a theoretical framework that integrates a multi-view data model with signal-to-noise ratio (SNR) analysis. They formally demonstrate, for the first time, that forgetting arises because accumulated noise eventually dominates the signal from earlier tasks. The study further reveals that prioritizing high-SNR tasks during learning not only enhances performance on low-SNR tasks but also mitigates forgetting, leading to a novel SNR-based task ordering principle. Both theoretical analysis and experiments—spanning synthetic and real-world datasets—confirm that sufficient accumulation of informative signals enables replay to effectively recover the performance of previously under-learned tasks.
📝 Abstract
Continual learning (CL) aims to train models on a sequence of tasks while retaining performance on previously learned ones. A core challenge in this setting is catastrophic forgetting, where new learning interferes with past knowledge. Among various mitigation strategies, data-replay methods, where past samples are periodically revisited, are considered simple yet effective, especially when memory constraints are relaxed. However, the theoretical effectiveness of full data replay, where all past data is accessible during training, remains largely unexplored. In this paper, we present a comprehensive theoretical framework for analyzing full data-replay training in continual learning from a feature learning perspective. Adopting a multi-view data model, we identify the signal-to-noise ratio (SNR) as a critical factor affecting forgetting. Focusing on task-incremental binary classification across $M$ tasks, our analysis verifies two key conclusions: (1) forgetting can still occur under full replay when the cumulative noise from later tasks dominates the signal from earlier ones; and (2) with sufficient signal accumulation, data replay can recover earlier tasks-even if their initial learning was poor. Notably, we uncover a novel insight into task ordering: prioritizing higher-signal tasks not only facilitates learning of lower-signal tasks but also helps prevent catastrophic forgetting. We validate our theoretical findings through synthetic and real-world experiments that visualize the interplay between signal learning and noise memorization across varying SNRs and task correlation regimes.