Asymptotic analysis of shallow and deep forgetting in replay with Neural Collapse

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

In continual learning, a critical asymmetry exists between “deep feature forgetting” and “shallow classifier forgetting”: minimal replay buffer sizes (any nonzero ratio) suffice to preserve linear separability of deep features, yet mitigating classifier performance degradation requires substantially larger buffers. Method: Grounded in neural collapse theory, the authors develop an asymptotic feature geometry model analyzing the evolution of class means and covariances; they show that strong collapse induced by small buffers preserves representational separability but severely impairs discriminability. Contribution/Results: They propose the first “deep–shallow forgetting decoupling” framework, interpreting representation stability through the lens of out-of-distribution detection. Crucially, they prove that correcting statistical bias—induced by nonstationary class priors under low replay rates—enables robust classification even with minimal buffers. This provides both theoretical foundations and practical optimization principles for efficient experience replay in continual learning.

Technology Category

Application Category

📝 Abstract

A persistent paradox in continual learning (CL) is that neural networks often retain linearly separable representations of past tasks even when their output predictions fail. We formalize this distinction as the gap between deep feature-space and shallow classifier-level forgetting. We reveal a critical asymmetry in Experience Replay: while minimal buffers successfully anchor feature geometry and prevent deep forgetting, mitigating shallow forgetting typically requires substantially larger buffer capacities. To explain this, we extend the Neural Collapse framework to the sequential setting. We characterize deep forgetting as a geometric drift toward out-of-distribution subspaces and prove that any non-zero replay fraction asymptotically guarantees the retention of linear separability. Conversely, we identify that the "strong collapse" induced by small buffers leads to rank-deficient covariances and inflated class means, effectively blinding the classifier to true population boundaries. By unifying CL with out-of-distribution detection, our work challenges the prevailing reliance on large buffers, suggesting that explicitly correcting these statistical artifacts could unlock robust performance with minimal replay.

Problem

Research questions and friction points this paper is trying to address.

Analyzes forgetting types in continual learning with replay

Explains asymmetry in buffer needs for feature vs classifier forgetting

Proposes minimal replay with statistical correction for robust performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Neural Collapse to sequential continual learning

Proves minimal replay prevents deep forgetting of features

Identifies small buffers cause statistical artifacts in classifiers

🔎 Similar Papers

Forgetting Order of Continual Learning: Examples That are Learned First are Forgotten Last