Not Every Token Needs Forgetting: Selective Unlearning to Limit Change in Utility in Large Language Model Unlearning

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Existing LLM unlearning methods indiscriminately update all target tokens, inadvertently degrading general-purpose knowledge (e.g., pronouns, prepositions). Method: This work proposes selective unlearning—challenging the “all-tokens-must-be-unlearned” assumption—and introduces, for the first time, a semantic-relevance-driven mechanism to identify and unlearn only a sparse subset of tokens strongly associated with privacy-sensitive or copyright-protected content. The method leverages gradient sensitivity and token-level importance scoring, enabling lightweight integration with existing unlearning algorithms (GA, EU, RMU). Results: On two major benchmarks, our approach achieves an average 12.7% improvement in unlearning success rate while preserving 98.4% of performance on retained tasks—outperforming six baselines and achieving a principled trade-off between effective unlearning and preservation of general linguistic capabilities.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM) unlearning has recently gained significant attention, driven by the need to remove unwanted information, such as private, sensitive, or copyrighted content, from LLMs. However, conventional unlearning approaches indiscriminately update model parameters to forget all tokens in a target document, including common tokens (e.g., pronouns, prepositions, general nouns) that carry general knowledge. In this paper, we highlight that not every token needs forgetting. We propose Selective Unlearning (SU), which identifies a critical subset of tokens within the forgetting set that is relevant to the unwanted information, and unlearns only those tokens. Experiments on two benchmarks and six baseline unlearning algorithms demonstrate that SU not only achieves effective unlearning on the targeted forget data, but also significantly preserves the model's utility in the retaining set.

Problem

Research questions and friction points this paper is trying to address.

Selectively unlearns unwanted tokens in LLMs

Preserves model utility by targeting specific tokens

Improves unlearning efficiency over conventional methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Unlearning targets specific tokens only

Preserves model utility by avoiding common tokens

Identifies critical tokens related to unwanted information

🔎 Similar Papers

Towards Effective Evaluations and Comparisons for LLM Unlearning Methods