Learning without Isolation: Pathway Protection for Continual Learning

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In continual learning, deep neural networks suffer from catastrophic forgetting—rapid degradation of performance on previously learned tasks. Existing parameter-protection methods face critical bottlenecks: storage overhead scaling linearly with the number of tasks and difficulty in precisely identifying task-critical parameters. This paper introduces a paradigm shift: “paths matter more than parameters.” Instead of protecting individual parameters, we model and preserve sparse activation paths that encode knowledge from prior tasks. We formulate model fusion as a graph-matching problem to enable non-isolated, path-level knowledge retention. Our approach integrates sparse activation modeling with adaptive channel allocation, achieving superior forward-task retention on mainstream benchmarks using significantly fewer parameters. It effectively mitigates catastrophic forgetting while simultaneously ensuring knowledge stability and parameter efficiency.

Technology Category

Application Category

📝 Abstract
Deep networks are prone to catastrophic forgetting during sequential task learning, i.e., losing the knowledge about old tasks upon learning new tasks. To this end, continual learning(CL) has emerged, whose existing methods focus mostly on regulating or protecting the parameters associated with the previous tasks. However, parameter protection is often impractical, since the size of parameters for storing the old-task knowledge increases linearly with the number of tasks, otherwise it is hard to preserve the parameters related to the old-task knowledge. In this work, we bring a dual opinion from neuroscience and physics to CL: in the whole networks, the pathways matter more than the parameters when concerning the knowledge acquired from the old tasks. Following this opinion, we propose a novel CL framework, learning without isolation(LwI), where model fusion is formulated as graph matching and the pathways occupied by the old tasks are protected without being isolated. Thanks to the sparsity of activation channels in a deep network, LwI can adaptively allocate available pathways for a new task, realizing pathway protection and addressing catastrophic forgetting in a parameter-efficient manner. Experiments on popular benchmark datasets demonstrate the superiority of the proposed LwI.
Problem

Research questions and friction points this paper is trying to address.

Addresses catastrophic forgetting in deep networks during sequential task learning
Protects pathways instead of parameters to preserve old-task knowledge
Proposes parameter-efficient pathway protection via adaptive allocation and model fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Protects old-task pathways via graph matching
Adaptively allocates pathways for new tasks
Addresses forgetting parameter-efficiently via sparsity
🔎 Similar Papers
Z
Zhikang Chen
Tsinghua University, Beijing, P.R.China
A
Abudukelimu Wuerkaixi
Tsinghua University, Beijing, P.R.China
Sen Cui
Sen Cui
Tsinghua Universitty
trust LLMAI Agentembodied intelligence
H
Haoxuan Li
Peking University
D
Ding Li
Tsinghua University, Beijing, P.R.China
J
Jingfeng Zhang
The University of Auckland
B
Bo Han
Hong Kong Baptist University
G
Gang Niu
RIKEN
H
Houfang Liu
Tsinghua University, Beijing, P.R.China
Y
Yi Yang
Tsinghua University, Beijing, P.R.China
Sifan Yang
Sifan Yang
Nanjing University
machine learningoptimization
Changshui Zhang
Changshui Zhang
Dept. Automation, Tsinghua University, Beijing, China
Machine LearningPattern RecognitionSignal ProcessingComputer VisionArtificial Intelligence
T
Tianling Ren
Tsinghua University, Beijing, P.R.China