Same accuracy, twice as fast: continuous training surpasses retraining from scratch

📅 2025-02-28

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address the prohibitive computational cost of training from scratch in scenarios where new and legacy data coexist, this paper proposes an efficient continual training paradigm that reuses pretrained models and historical data instead of performing full retraining. Methodologically, we first systematically quantify the computational benefits of continual training and then introduce a four-dimensional optimization framework: (i) warm-start initialization, (ii) elastic weight consolidation (EWC)-inspired regularization, (iii) curriculum-driven sampling of legacy data, and (iv) adaptive learning rate scheduling. Evaluated across multiple computer vision tasks, our approach achieves up to 2.7× training speedup while maintaining or even surpassing the accuracy of from-scratch training. These results empirically validate the efficiency and practicality of continual training under realistic data availability constraints, offering a lightweight, deployable pathway for iterative model updates.

Technology Category

Application Category

📝 Abstract

Continual learning aims to enable models to adapt to new datasets without losing performance on previously learned data, often assuming that prior data is no longer available. However, in many practical scenarios, both old and new data are accessible. In such cases, good performance on both datasets is typically achieved by abandoning the model trained on the previous data and re-training a new model from scratch on both datasets. This training from scratch is computationally expensive. In contrast, methods that leverage the previously trained model and old data are worthy of investigation, as they could significantly reduce computational costs. Our evaluation framework quantifies the computational savings of such methods while maintaining or exceeding the performance of training from scratch. We identify key optimization aspects -- initialization, regularization, data selection, and hyper-parameters -- that can each contribute to reducing computational costs. For each aspect, we propose effective first-step methods that already yield substantial computational savings. By combining these methods, we achieve up to 2.7x reductions in computation time across various computer vision tasks, highlighting the potential for further advancements in this area.

Problem

Research questions and friction points this paper is trying to address.

Continual learning adapts models to new data without losing prior performance.

Re-training from scratch is computationally expensive and inefficient.

Proposed methods reduce computation time while maintaining or improving accuracy.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages previously trained models and old data

Proposes methods for initialization, regularization, data selection

Achieves up to 2.7x reduction in computation time

🔎 Similar Papers

Forgetting Order of Continual Learning: Examples That are Learned First are Forgotten Last