🤖 AI Summary
In continual learning, deep neural networks suffer from catastrophic forgetting—rapid degradation of performance on previously learned tasks. Existing parameter-protection methods face critical bottlenecks: storage overhead scaling linearly with the number of tasks and difficulty in precisely identifying task-critical parameters. This paper introduces a paradigm shift: “paths matter more than parameters.” Instead of protecting individual parameters, we model and preserve sparse activation paths that encode knowledge from prior tasks. We formulate model fusion as a graph-matching problem to enable non-isolated, path-level knowledge retention. Our approach integrates sparse activation modeling with adaptive channel allocation, achieving superior forward-task retention on mainstream benchmarks using significantly fewer parameters. It effectively mitigates catastrophic forgetting while simultaneously ensuring knowledge stability and parameter efficiency.
📝 Abstract
Deep networks are prone to catastrophic forgetting during sequential task learning, i.e., losing the knowledge about old tasks upon learning new tasks. To this end, continual learning(CL) has emerged, whose existing methods focus mostly on regulating or protecting the parameters associated with the previous tasks. However, parameter protection is often impractical, since the size of parameters for storing the old-task knowledge increases linearly with the number of tasks, otherwise it is hard to preserve the parameters related to the old-task knowledge. In this work, we bring a dual opinion from neuroscience and physics to CL: in the whole networks, the pathways matter more than the parameters when concerning the knowledge acquired from the old tasks. Following this opinion, we propose a novel CL framework, learning without isolation(LwI), where model fusion is formulated as graph matching and the pathways occupied by the old tasks are protected without being isolated. Thanks to the sparsity of activation channels in a deep network, LwI can adaptively allocate available pathways for a new task, realizing pathway protection and addressing catastrophic forgetting in a parameter-efficient manner. Experiments on popular benchmark datasets demonstrate the superiority of the proposed LwI.