Efficient Weight-Space Laplace-Gaussian Filtering and Smoothing for Sequential Deep Learning

📅 2024-10-09

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

232K/year

🤖 AI Summary

To address the fundamental trade-off between catastrophic forgetting and plasticity loss in continual learning, this work models network weights as a nonlinear Gaussian state-space model. It introduces the first unified framework in weight space that integrates Laplace approximation with Gaussian filtering and Rauch–Tung–Striebel (RTS) smoothing. The method employs a Generalized Gauss-Newton (GGN)-structure-driven diagonal-plus-low-rank posterior covariance approximation, enabling task-shift-aware prior modeling and enabling Bayesian smoothing enhancement without revisiting original data. Crucially, it breaks the conventional unidirectional information flow by explicitly incorporating task-relational priors to improve knowledge reuse efficiency. Empirically, the approach significantly mitigates forgetting and boosts accuracy on subsequent tasks across standard continual learning benchmarks. Moreover, even single-task models benefit from post-hoc smoothing, achieving further performance gains under data-free conditions.

Technology Category

Application Category

📝 Abstract

Efficiently learning a sequence of related tasks, such as in continual learning, poses a significant challenge for neural nets due to the delicate trade-off between catastrophic forgetting and loss of plasticity. We address this challenge with a grounded framework for sequentially learning related tasks based on Bayesian inference. Specifically, we treat the model's parameters as a nonlinear Gaussian state-space model and perform efficient inference using Gaussian filtering and smoothing. This general formalism subsumes existing continual learning approaches, while also offering a clearer conceptual understanding of its components. Leveraging Laplace approximations during filtering, we construct Gaussian posterior measures on the weight space of a neural network for each task. We use it as an efficient regularizer by exploiting the structure of the generalized Gauss-Newton matrix (GGN) to construct diagonal plus low-rank approximations. The dynamics model allows targeted control of the learning process and the incorporation of domain-specific knowledge, such as modeling the type of shift between tasks. Additionally, using Bayesian approximate smoothing can enhance the performance of task-specific models without needing to re-access any data.

Problem

Research questions and friction points this paper is trying to address.

Sequential learning balances knowledge retention with task adaptation

Regularization lacks task relationship knowledge and future information flow

Bayesian framework encodes task relationships and enables backward knowledge transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian framework for sequential deep learning

Low-rank filtering and smoothing approximations

Task-specific models incorporate future knowledge

🔎 Similar Papers

Signal Processing Meets SGD: From Momentum to Filter