🤖 AI Summary
Continual learning (CL) suffers from catastrophic forgetting and a lack of theoretical guarantees—especially when applied to pre-trained models. To address these challenges, we propose TSVD-CL: a simple yet theoretically rigorous CL method based on feature lifting and dynamic truncated singular value decomposition (TSVD). Our work establishes, for the first time in CL, provably tight upper bounds on both training and generalization errors. We design a TSVD mechanism satisfying recursive update constraints, enabling numerically stable and provably optimal learning within an overparameterized minimum-norm regression framework. TSVD-CL achieves state-of-the-art performance across multiple standard CL benchmarks, exhibits strong robustness to hyperparameters, supports continual learning over 100 tasks, and ensures bounded, controllable error—thereby bridging the long-standing gap between theoretical rigor and empirical effectiveness in continual learning.
📝 Abstract
The goal of continual learning (CL) is to train a model that can solve multiple tasks presented sequentially. Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks. However, such methods lack theoretical guarantees, making them prone to unexpected failures. Conversely, principled CL approaches often fail to achieve competitive performance. In this work, we aim to bridge this gap between theory and practice by designing a simple CL method that is theoretically sound and highly performant. Specifically, we lift pre-trained features into a higher dimensional space and formulate an over-parametrized minimum-norm least-squares problem. We find that the lifted features are highly ill-conditioned, potentially leading to large training errors (numerical instability) and increased generalization errors. We address these challenges by continually truncating the singular value decomposition (SVD) of the lifted features. Our approach, termed TSVD, is stable with respect to the choice of hyperparameters, can handle hundreds of tasks, and outperforms state-of-the-art CL methods on multiple datasets. Importantly, our method satisfies a recurrence relation throughout its continual learning process, which allows us to prove it maintains small training and generalization errors by appropriately truncating a fraction of SVD factors. This results in a stable continual learning method with strong empirical performance and theoretical guarantees. Code available: https://github.com/liangzu/tsvd.