Hyperparameter Selection in Continual Learning

📅 2024-04-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional hyperparameter optimization (HPO) is infeasible in continual learning (CL) due to its reliance on static, single-pass data access—contradicting CL’s sequential, non-stationary data stream. A practical, CL-adapted HPO framework is urgently needed. Method: We systematically evaluate realistic, online-compatible HPO strategies—including online validation, replay-based validation, and meta-learning-inspired heuristics—and introduce the first empirically grounded, CL-specific comparative HPO framework. Contribution/Results: We find that standard CL benchmarks fail to meaningfully discriminate between HPO methods; end-to-end HPO is computationally impractical in CL settings; and efficiency metrics (e.g., compute cost, memory, single-pass constraint) are more informative than accuracy alone for real-world deployment. Consequently, we advocate for purpose-built CL HPO evaluation benchmarks and recommend lightweight, single-epoch HPO schemes. These findings hold consistently across multiple CL benchmarks, delivering actionable methodological guidance for hyperparameter selection in continual learning.

Technology Category

Application Category

📝 Abstract
In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparameter settings. However, this end-of-training HPO is unusable in practice since a learner can only see the stream once. Hence, there is an open question: what HPO framework should a practitioner use for a CL problem in reality? This paper looks at this question by comparing several realistic HPO frameworks. We find that none of the HPO frameworks considered, including end-of-training HPO, perform consistently better than the rest on popular CL benchmarks. We therefore arrive at a twofold conclusion: a) to be able to discriminate between HPO frameworks there is a need to move beyond the current most commonly used CL benchmarks, and b) on the popular CL benchmarks examined, a CL practitioner should use a realistic HPO framework and can select it based on factors separate from performance, for example compute efficiency.
Problem

Research questions and friction points this paper is trying to address.

Hyperparameter optimization in continual learning challenges
Comparison of realistic HPO frameworks for CL
Need for new benchmarks to evaluate HPO frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Develops CL-specific hyperparameter optimization frameworks
Compares realistic HPO frameworks for continual learning
Suggests using HPO frameworks based on compute efficiency
🔎 Similar Papers
T
Thomas L. Lee
School of Informatics, University of Edinburgh
S
Sigrid Passano Hellan
School of Informatics, University of Edinburgh
Linus Ericsson
Linus Ericsson
University of Glasgow
Machine learningDeep learningComputer VisionArtificial Intelligence
Elliot J. Crowley
Elliot J. Crowley
Associate Professor, University of Edinburgh
Machine LearningDeep LearningComputer Vision
A
A. Storkey
School of Informatics, University of Edinburgh