π€ AI Summary
This work addresses catastrophic forgetting in continual learning caused by distribution shifts across tasks, particularly when tasks exhibit dependencies. The authors posit that data from the current task can be modeled as a nonlinear transformation of data from previous tasks, thereby formalizing a task-dependency structure. Building on this assumption, they integrate techniques from nonlinear regression, experience replay, and knowledge distillation to derive, for the first time, a non-vacuous estimation error bound with practical significance. This theoretical framework provides the first rigorous statistical recovery guarantee for continual learning methods that incorporate memory replay and multiple regularization strategies, substantially enhancing the interpretability and reliability of such algorithms.
π Abstract
Continual learning (CL) is concerned with learning multiple tasks sequentially without forgetting previously learned tasks. Despite substantial empirical advances over recent years, the theoretical development of CL remains in its infancy. At the heart of developing CL theory lies the challenge that the data distribution varies across tasks, and we argue that properly addressing this challenge requires understanding this variation--dependency among tasks. To explicitly model task dependency, we consider nonlinear regression tasks and propose the assumption that these tasks are dependent in such a way that the data of the current task is a nonlinear transformation of previous data. With this model and under natural assumptions, we prove statistical recovery guarantees (more specifically, bounds on estimation errors) for several CL paradigms in practical use, including experience replay with data-independent regularization and data-independent weights that balance the losses of tasks, replay with data-dependent weights, and continual learning with data-dependent regularization (e.g., knowledge distillation). To the best of our knowledge, our bounds are informative in cases where prior work gives vacuous bounds.