Recovery Guarantees for Continual Learning of Dependent Tasks: Memory, Data-Dependent Regularization, and Data-Dependent Weights

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses catastrophic forgetting in continual learning caused by distribution shifts across tasks, particularly when tasks exhibit dependencies. The authors posit that data from the current task can be modeled as a nonlinear transformation of data from previous tasks, thereby formalizing a task-dependency structure. Building on this assumption, they integrate techniques from nonlinear regression, experience replay, and knowledge distillation to derive, for the first time, a non-vacuous estimation error bound with practical significance. This theoretical framework provides the first rigorous statistical recovery guarantee for continual learning methods that incorporate memory replay and multiple regularization strategies, substantially enhancing the interpretability and reliability of such algorithms.

Technology Category

Application Category

📝 Abstract

Continual learning (CL) is concerned with learning multiple tasks sequentially without forgetting previously learned tasks. Despite substantial empirical advances over recent years, the theoretical development of CL remains in its infancy. At the heart of developing CL theory lies the challenge that the data distribution varies across tasks, and we argue that properly addressing this challenge requires understanding this variation--dependency among tasks. To explicitly model task dependency, we consider nonlinear regression tasks and propose the assumption that these tasks are dependent in such a way that the data of the current task is a nonlinear transformation of previous data. With this model and under natural assumptions, we prove statistical recovery guarantees (more specifically, bounds on estimation errors) for several CL paradigms in practical use, including experience replay with data-independent regularization and data-independent weights that balance the losses of tasks, replay with data-dependent weights, and continual learning with data-dependent regularization (e.g., knowledge distillation). To the best of our knowledge, our bounds are informative in cases where prior work gives vacuous bounds.

Problem

Research questions and friction points this paper is trying to address.

continual learning

task dependency

recovery guarantees

data distribution shift

nonlinear regression

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual learning

task dependency

recovery guarantees