Continual learning with the neural tangent ensemble

πŸ“… 2024-08-30
πŸ›οΈ Neural Information Processing Systems
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses catastrophic forgetting in continual learning by proposing a forgetting-free learning framework grounded in the Neural Tangent Kernel (NTK) and Bayesian inference. Methodologically, it models deep networks as Bayesian ensembles of Neural Tangent Expertsβ€”each associated with a fixed or adaptive classifier. Theoretically, under the lazy training regime, the network is proven equivalent to a fixed-expert ensemble, and its Bayesian posterior update reduces to scaled-projected stochastic gradient descent. This framework unifies static and dynamic ensemble behaviors for the first time, yielding interpretable and computationally tractable parameter update rules. Empirically, it significantly mitigates forgetting on standard continual learning benchmarks, validating both theoretical soundness and practical efficacy.

Technology Category

Application Category

πŸ“ Abstract
A natural strategy for continual learning is to weigh a Bayesian ensemble of fixed functions. This suggests that if a (single) neural network could be interpreted as an ensemble, one could design effective algorithms that learn without forgetting. To realize this possibility, we observe that a neural network classifier with N parameters can be interpreted as a weighted ensemble of N classifiers, and that in the lazy regime limit these classifiers are fixed throughout learning. We call these classifiers the neural tangent experts and show they output valid probability distributions over the labels. We then derive the likelihood and posterior probability of each expert given past data. Surprisingly, the posterior updates for these experts are equivalent to a scaled and projected form of stochastic gradient descent (SGD) over the network weights. Away from the lazy regime, networks can be seen as ensembles of adaptive experts which improve over time. These results offer a new interpretation of neural networks as Bayesian ensembles of experts, providing a principled framework for understanding and mitigating catastrophic forgetting in continual learning settings.
Problem

Research questions and friction points this paper is trying to address.

Continual learning without forgetting
Neural networks as Bayesian ensembles
Mitigating catastrophic forgetting in learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural tangent ensemble interpretation
Bayesian framework for continual learning
Mitigating catastrophic forgetting with SGD
πŸ”Ž Similar Papers
No similar papers found.
A
Ari S. Benjamin
Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724
Christian Pehle
Christian Pehle
NeuroAI Scholar, Cold Spring Harbor Laboratory
Machine LearningComputational NeuroscienceTheoretical PhysicsBrain-inspired Computing
K
Kyle Daruwalla
Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724