๐ค AI Summary
To address the fundamental trade-off between catastrophic forgetting and plasticity loss in continual learning, this work models network weights as a nonlinear Gaussian state-space model. It introduces the first unified framework in weight space that integrates Laplace approximation with Gaussian filtering and RauchโTungโStriebel (RTS) smoothing. The method employs a Generalized Gauss-Newton (GGN)-structure-driven diagonal-plus-low-rank posterior covariance approximation, enabling task-shift-aware prior modeling and enabling Bayesian smoothing enhancement without revisiting original data. Crucially, it breaks the conventional unidirectional information flow by explicitly incorporating task-relational priors to improve knowledge reuse efficiency. Empirically, the approach significantly mitigates forgetting and boosts accuracy on subsequent tasks across standard continual learning benchmarks. Moreover, even single-task models benefit from post-hoc smoothing, achieving further performance gains under data-free conditions.
๐ Abstract
Efficiently learning a sequence of related tasks, such as in continual learning, poses a significant challenge for neural nets due to the delicate trade-off between catastrophic forgetting and loss of plasticity. We address this challenge with a grounded framework for sequentially learning related tasks based on Bayesian inference. Specifically, we treat the model's parameters as a nonlinear Gaussian state-space model and perform efficient inference using Gaussian filtering and smoothing. This general formalism subsumes existing continual learning approaches, while also offering a clearer conceptual understanding of its components. Leveraging Laplace approximations during filtering, we construct Gaussian posterior measures on the weight space of a neural network for each task. We use it as an efficient regularizer by exploiting the structure of the generalized Gauss-Newton matrix (GGN) to construct diagonal plus low-rank approximations. The dynamics model allows targeted control of the learning process and the incorporation of domain-specific knowledge, such as modeling the type of shift between tasks. Additionally, using Bayesian approximate smoothing can enhance the performance of task-specific models without needing to re-access any data.