🤖 AI Summary
This work identifies the fundamental mechanism underlying “plasticity loss” in deep neural networks during continual learning: spectral collapse of the Hessian matrix, which degrades the parameter space’s responsiveness to gradients from new tasks. To address this, we propose the τ-trainability theoretical framework—unifying plasticity preservation under a single, principled lens—and develop a Kronecker-factored Hessian spectral analysis method. Building on this, we introduce a joint strategy: (i) preserving effective feature rank to delay spectral collapse, and (ii) applying adaptive L2 regularization to suppress overfitting along uninformative parameter directions. Experiments demonstrate substantial improvements in plasticity retention and cross-task generalization across diverse continual learning and reinforcement learning benchmarks. Our approach provides an interpretable, scalable, and theoretically grounded paradigm for plasticity regulation—bridging mechanistic insight with practical algorithmic design.
📝 Abstract
We investigate why deep neural networks suffer from emph{loss of plasticity} in deep continual learning, failing to learn new tasks without reinitializing parameters. We show that this failure is preceded by Hessian spectral collapse at new-task initialization, where meaningful curvature directions vanish and gradient descent becomes ineffective. To characterize the necessary condition for successful training, we introduce the notion of $τ$-trainability and show that current plasticity preserving algorithms can be unified under this framework. Targeting spectral collapse directly, we then discuss the Kronecker factored approximation of the Hessian, which motivates two regularization enhancements: maintaining high effective feature rank and applying $L2$ penalties. Experiments on continual supervised and reinforcement learning tasks confirm that combining these two regularizers effectively preserves plasticity.