Forget What Matters, Keep the Rest: Selective Unlearning of Informative Tokens

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

126K/year

🤖 AI Summary

Current large language models, when applying uniform unlearning losses, neglect the semantic importance of individual tokens, leading to unnecessary degradation in model utility. This work proposes a token-level unlearning regularization method based on the entropy of predictive distributions, which—unlike prior approaches—uses prediction entropy as an adaptive proxy for token informativeness without requiring ground-truth confidence scores or external parsers to distinguish between informative and structural tokens. By introducing an entropy-guided token weighting (ETW) mechanism, the method dynamically modulates the unlearning strength applied to each token during fine-tuning. Experimental results demonstrate that this approach achieves effective selective unlearning while significantly outperforming existing token-level strategies and better preserving overall model performance.

Technology Category

Application Category

📝 Abstract

Unlearning in large language models (LLMs) has emerged as a promising safeguard against adversarial behaviors. When the forgetting loss is applied uniformly without considering token-level semantic importance, model utility can be unnecessarily degraded. Recent studies have explored token-wise loss regularizers that prioritize informative tokens, but largely rely on ground-truth confidence or external linguistic parsers, which limits their ability to capture contextual information or the model's overall predictive state. Intuitively, function words like "the" primarily serve syntactic roles and are highly predictable with little ambiguity, but informative words admit multiple plausible alternatives with greater uncertainty. Based on this intuition, we propose Entropy-guided Token Weighting (ETW), a token-level unlearning regularizer that uses entropy of the predictive distribution as a proxy for token informativeness. We demonstrate that informative tokens tend to have higher entropy, whereas structural tokens tend to have lower entropy. This behavior enables ETW to achieve more effective unlearning while better preserving model utility than existing token-level approaches.

Problem

Research questions and friction points this paper is trying to address.

unlearning

large language models

token informativeness

model utility

selective forgetting

Innovation

Methods, ideas, or system contributions that make the work stand out.

selective unlearning

token-level regularization

predictive entropy