๐ค AI Summary
This work proposes Enhanced Task-Continual Learning (ETCL), a novel approach that addresses the limitations of existing continual learning methods, which primarily focus on mitigating catastrophic forgetting while neglecting systematic promotion of both forward and backward knowledge transfer. ETCL isolates sparse subnetworks via task-specific binary masks and integrates gradient alignment, orthogonal projection, and a dual-objective optimization framework. Notably, it provides the first theoretical characterization of the boundary of negative knowledge transfer and introduces an online task similarity detection mechanism. Extensive experiments demonstrate that ETCL significantly outperforms strong baselines across sequences of similar, dissimilar, and mixed tasks, achieving near-zero forgetting while simultaneously enhancing bidirectional positive knowledge transfer.
๐ Abstract
Existing research on continual learning (CL) of a sequence of tasks focuses mainly on dealing with catastrophic forgetting (CF) to balance the learning plasticity of new tasks and the memory stability of old tasks. However, an ideal CL agent should not only be able to overcome CF, but also encourage positive forward and backward knowledge transfer (KT), i.e., using the learned knowledge from previous tasks for the new task learning (namely FKT), and improving the previous tasks'performance with the knowledge of the new task (namely BKT). To this end, this paper first models CL as an optimization problem in which each sequential learning task aims to achieve its optimal performance under the constraint that both FKT and BKT should be positive. It then proposes a novel Enhanced Task Continual Learning (ETCL) method, which achieves forgetting-free and positive KT. Furthermore, the bounds that can lead to negative FKT and BKT are estimated theoretically. Based on the bounds, a new strategy for online task similarity detection is also proposed to facilitate positive KT. To overcome CF, ETCL learns a set of task-specific binary masks to isolate a sparse sub-network for each task while preserving the performance of a dense network for the task. At the beginning of a new task learning, ETCL tries to align the new task's gradient with that of the sub-network of the previous most similar task to ensure positive FKT. By using a new bi-objective optimization strategy and an orthogonal gradient projection method, ETCL updates only the weights of previous similar tasks at the classification layer to achieve positive BKT. Extensive evaluations demonstrate that the proposed ETCL markedly outperforms strong baselines on dissimilar, similar, and mixed task sequences.