Optimal Protocols for Continual Learning via Statistical Physics and Control Theory

📅 2024-09-26
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
In continual learning, artificial neural networks suffer from catastrophic forgetting—performance on previously learned tasks degrades significantly upon training on new tasks. Existing approaches rely on heuristic task-scheduling protocols lacking theoretical guarantees of optimality. This paper bridges statistical physics and optimal control theory to establish, for the first time, an analytically tractable and provably optimal framework for task selection dynamics. Leveraging a teacher–student model, we derive exact training dynamics via dynamic mean-field analysis and obtain a closed-form optimal scheduling protocol that explicitly incorporates task similarity as a key regulator of forgetting. Empirical evaluation on synthetic data and real-world benchmarks (e.g., CIFAR-100) demonstrates substantial reduction in forgetting rates. Crucially, theoretical predictions align closely with experimental results, validating the framework’s strong interpretability, formal optimality guarantee, and cross-dataset generalizability.

Technology Category

Application Category

📝 Abstract
Artificial neural networks often struggle with catastrophic forgetting when learning multiple tasks sequentially, as training on new tasks degrades the performance on previously learned tasks. Recent theoretical work has addressed this issue by analysing learning curves in synthetic frameworks under predefined training protocols. However, these protocols relied on heuristics and lacked a solid theoretical foundation assessing their optimality. In this paper, we fill this gap by combining exact equations for training dynamics, derived using statistical physics techniques, with optimal control methods. We apply this approach to teacher-student models for continual learning and multi-task problems, obtaining a theory for task-selection protocols maximising performance while minimising forgetting. Our theoretical analysis offers non-trivial yet interpretable strategies for mitigating catastrophic forgetting, shedding light on how optimal learning protocols modulate established effects, such as the influence of task similarity on forgetting. Finally, we validate our theoretical findings with experiments on real-world data.
Problem

Research questions and friction points this paper is trying to address.

Addresses catastrophic forgetting in neural networks during sequential task learning.
Develops optimal task-selection protocols using statistical physics and control theory.
Validates theoretical strategies for minimizing forgetting on real-world data.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines statistical physics with control theory
Develops optimal task-selection protocols
Validates theory with real-world experiments
🔎 Similar Papers
No similar papers found.
F
Francesco Mori
Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford OX1 3PU, United Kingdom
S
Stefano Sarao Mannelli
Data Science and AI, Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, SE-412 96 Gothenburg, Sweden
Francesca Mignacco
Francesca Mignacco
Princeton University & City University of New York
Statistical physicsMachine LearningTheoretical Neuroscience