Chained Tuning Leads to Biased Forgetting

📅 2024-12-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from catastrophic forgetting during multi-stage continual fine-tuning, particularly exhibiting systematic loss of safety-aligned knowledge—a phenomenon we term “biased forgetting,” wherein forgetting severity varies significantly across demographic groups. This work introduces the first quantitative metric for biased forgetting, revealing the asymmetry and group sensitivity of safety capability degradation with respect to task ordering. Through controlled safety-knowledge probing, comparative fine-tuning sequence experiments, and ablation studies of replay- and regularization-based mitigation strategies, we empirically demonstrate that reversing the fine-tuning order reduces overall safety forgetting by 37% and improves inter-group fairness in forgetting by 52%. Our contributions include: (1) a formal, quantifiable definition of biased forgetting; (2) empirical evidence of task-order–dependent safety erosion across populations; and (3) a reproducible evaluation framework and actionable intervention strategies for safety-aware continual learning.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are often fine-tuned for use on downstream tasks, though this can degrade capabilities learned during previous training. This phenomenon, often referred to as catastrophic forgetting, has important potential implications for the safety of deployed models. In this work, we first show that models trained on downstream tasks forget their safety tuning to a greater extent than models trained in the opposite order. Second, we show that forgetting disproportionately impacts safety information about certain groups. To quantify this phenomenon, we define a new metric we term biased forgetting. We conduct a systematic evaluation of the effects of task ordering on forgetting and apply mitigations that can help the model recover from the forgetting observed. We hope our findings can better inform methods for chaining the finetuning of LLMs in continual learning settings to enable training of safer and less toxic models.
Problem

Research questions and friction points this paper is trying to address.

Catastrophic Forgetting
Language Model Safety
Continuous Adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Biased Forgetting
Continuous Learning
Safety and Fairness