π€ AI Summary
To address catastrophic forgetting in continual learning of deep neural networks, this paper proposes a novel parameter selection method based on activity analysis during the late-stage plateau phase of training. Unlike conventional approaches that monitor parameter dynamics throughout the entire training process, our method identifies highly active parameters during the plateau phaseβthese correspond to flat regions of the loss landscape, which facilitate joint optimization of old and new knowledge. We introduce a regularization mechanism that dynamically tracks parameter movement and variability, focusing evaluation on parameter adaptability after convergence. Experiments demonstrate that our approach significantly mitigates forgetting while simultaneously improving performance on new tasks, achieving a superior balance between forward and backward transfer accuracy. By leveraging interpretable, convergence-driven parameter activity, the method establishes a lightweight, efficient, and theoretically grounded paradigm for selective parameter adaptation in continual learning.
π Abstract
Catastrophic forgetting in deep neural networks occurs when learning new tasks degrades performance on previously learned tasks due to knowledge overwriting. Among the approaches to mitigate this issue, regularization techniques aim to identify and constrain "important" parameters to preserve previous knowledge. In the highly nonconvex optimization landscape of deep learning, we propose a novel perspective: tracking parameters during the final training plateau is more effective than monitoring them throughout the entire training process. We argue that parameters that exhibit higher activity (movement and variability) during this plateau reveal directions in the loss landscape that are relatively flat, making them suitable for adaptation to new tasks while preserving knowledge from previous ones. Our comprehensive experiments demonstrate that this approach achieves superior performance in balancing catastrophic forgetting mitigation with strong performance on newly learned tasks.