An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning

📅 2024-02-12

📈 Citations: 2

✨ Influential: 0

career value

180K/year

🤖 AI Summary

To mitigate catastrophic forgetting in multi-stage fine-tuning—exacerbated by computational constraints on foundation models—this paper proposes a lightweight replay sample selection paradigm requiring no additional forward passes. Our method introduces the mix-cd sampling mechanism, which estimates the density distribution of “collateral damage” samples via prediction consistency analysis, enabling precise identification and efficient filtering of highly forgettable instances. Unlike conventional approaches, it eliminates reliance on fixed-size memory buffers or auxiliary model inference, achieving superior knowledge retention under strict computational budgets. Experiments demonstrate that our approach attains state-of-the-art continual learning performance with significantly lower overhead across multiple benchmarks. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Incrementally fine-tuning foundational models on new tasks or domains is now the de facto approach in NLP. A known pitfall of this approach is the emph{catastrophic forgetting} of prior knowledge that happens during fine-tuning. A common approach to alleviate such forgetting is to rehearse samples from prior tasks during fine-tuning. Several existing works assume a fixed memory buffer to store prior task examples, while relying on inferences (forward passes) with the model at hand for choosing examples for rehearsal from the buffer. However, given the increasing computational cost of model inference, and decreasing cost of data storage, we focus on the setting to rehearse samples with a fixed computational budget instead of a fixed memory budget. We propose a sampling scheme, exttt{f mix-cd}, that prioritizes rehearsal of ``collateral damage'' samples, which are samples predicted correctly by the prior model but forgotten by the incrementally tuned one. The crux of our scheme is a procedure to efficiently estimate the density of collateral damage samples without incurring additional model inferences. Our approach is computationally efficient, easy to implement, and outperforms several leading continual learning methods in compute-constrained settings. All the code will be publicly available at https://github.com/jybai/mix-cd-rehearsal.

Problem

Research questions and friction points this paper is trying to address.

Mitigate catastrophic forgetting in multi-stage fine-tuning

Prioritize rehearsal of collateral damage samples

Efficiently estimate sample density without additional inferences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prioritizes collateral damage samples

Efficient density estimation without inferences

Computationally efficient rehearsal scheme

🔎 Similar Papers

No similar papers found.