🤖 AI Summary
This work investigates the effectiveness of Bayesian methods in mitigating catastrophic forgetting in continual learning. We identify fundamental limitations of sequential Bayesian inference in Bayesian neural networks (BNNs), particularly model misspecification and task imbalance sensitivity arising from reusing weight-space priors across tasks. For the first time, we employ Hamiltonian Monte Carlo (HMC) to perform exact sequential posterior propagation, revealing its inherent inadequacy in weight space. To overcome this, we propose a novel paradigm—Prototype Bayesian Continual Learning—that shifts modeling from weight-space priors to task-level latent variables governing the generative process, thereby decoupling knowledge representation from parameter updates. Experiments demonstrate that conventional sequential BNNs inevitably suffer forgetting, whereas our approach achieves state-of-the-art Bayesian continual learning performance on class-incremental vision benchmarks. This work establishes a new baseline that is interpretable, robust, and scalable for Bayesian continual learning.
📝 Abstract
Sequential Bayesian inference can be used for continual learning to prevent catastrophic forgetting of past tasks and provide an informative prior when learning new tasks. We revisit sequential Bayesian inference and assess whether using the previous task’s posterior as a prior for a new task can prevent catastrophic forgetting in Bayesian neural networks. Our first contribution is to perform sequential Bayesian inference using Hamiltonian Monte Carlo. We propagate the posterior as a prior for new tasks by approximating the posterior via fitting a density estimator on Hamiltonian Monte Carlo samples. We find that this approach fails to prevent catastrophic forgetting, demonstrating the difficulty in performing sequential Bayesian inference in neural networks. From there, we study simple analytical examples of sequential Bayesian inference and CL and highlight the issue of model misspecification, which can lead to sub-optimal continual learning performance despite exact inference. Furthermore, we discuss how task data imbalances can cause forgetting. From these limitations, we argue that we need probabilistic models of the continual learning generative process rather than relying on sequential Bayesian inference over Bayesian neural network weights. Our final contribution is to propose a simple baseline called Prototypical Bayesian Continual Learning, which is competitive with the best performing Bayesian continual learning methods on class incremental continual learning computer vision benchmarks.