FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the utility degradation and catastrophic forgetting experienced by large language models under frequent and continuous deletion requests. To this end, we propose FIT, the first framework designed to enable efficient and stable continual unlearning. FIT integrates data filtering, importance-aware fine-tuning, and target-layer attribution to ensure effective forgetting while preserving model performance. We further introduce the PCH benchmark along with symmetric evaluation metrics—Forgetting Degree (F.D.) and Retained Utility (R.U.)—to rigorously assess unlearning efficacy and knowledge retention. Experiments across four open-source LLMs demonstrate that FIT significantly outperforms existing methods, maintaining high accuracy on benchmarks such as MMLU, CommonsenseQA, and GSM8K, while effectively resisting relearning and quantization-based recovery attacks.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) demonstrate impressive capabilities across diverse tasks but raise concerns about privacy, copyright, and harmful materials. Existing LLM unlearning methods rarely consider the continual and high-volume nature of real-world deletion requests, which can cause utility degradation and catastrophic forgetting as requests accumulate. To address this challenge, we introduce \fit, a framework for continual unlearning that handles large numbers of deletion requests while maintaining robustness against both catastrophic forgetting and post-unlearning recovery. \fit mitigates degradation through rigorous data \underline{F}iltering, \underline{I}mportance-aware updates, and \underline{T}argeted layer attribution, enabling stable performance across long sequences of unlearning operations and achieving a favorable balance between forgetting effectiveness and utility retention. To support realistic evaluation, we present \textbf{PCH}, a benchmark covering \textbf{P}ersonal information, \textbf{C}opyright, and \textbf{H}armful content in sequential deletion scenarios, along with two symmetric metrics, Forget Degree (F.D.) and Retain Utility (R.U.), which jointly assess forgetting quality and utility preservation. Extensive experiments on four open-source LLMs with hundreds of deletion requests show that \fit achieves the strongest trade-off between F.D. and R.U., surpasses existing methods on MMLU, CommonsenseQA, and GSM8K, and remains resistant against both relearning and quantization recovery attacks.

Problem

Research questions and friction points this paper is trying to address.

catastrophic forgetting

continual unlearning

deletion requests

utility degradation

LLM unlearning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual Unlearning

Catastrophic Forgetting

Large Language Models