🤖 AI Summary
In continual learning, artificial neural networks suffer from catastrophic forgetting—performance on previously learned tasks degrades significantly upon training on new tasks. Existing approaches rely on heuristic task-scheduling protocols lacking theoretical guarantees of optimality. This paper bridges statistical physics and optimal control theory to establish, for the first time, an analytically tractable and provably optimal framework for task selection dynamics. Leveraging a teacher–student model, we derive exact training dynamics via dynamic mean-field analysis and obtain a closed-form optimal scheduling protocol that explicitly incorporates task similarity as a key regulator of forgetting. Empirical evaluation on synthetic data and real-world benchmarks (e.g., CIFAR-100) demonstrates substantial reduction in forgetting rates. Crucially, theoretical predictions align closely with experimental results, validating the framework’s strong interpretability, formal optimality guarantee, and cross-dataset generalizability.
📝 Abstract
Artificial neural networks often struggle with catastrophic forgetting when learning multiple tasks sequentially, as training on new tasks degrades the performance on previously learned tasks. Recent theoretical work has addressed this issue by analysing learning curves in synthetic frameworks under predefined training protocols. However, these protocols relied on heuristics and lacked a solid theoretical foundation assessing their optimality. In this paper, we fill this gap by combining exact equations for training dynamics, derived using statistical physics techniques, with optimal control methods. We apply this approach to teacher-student models for continual learning and multi-task problems, obtaining a theory for task-selection protocols maximising performance while minimising forgetting. Our theoretical analysis offers non-trivial yet interpretable strategies for mitigating catastrophic forgetting, shedding light on how optimal learning protocols modulate established effects, such as the influence of task similarity on forgetting. Finally, we validate our theoretical findings with experiments on real-world data.