Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work proposes PALU, a novel framework for targeted forgetting in large language models that addresses the performance degradation commonly caused by global optimization over entire output sequences. Instead of applying blanket interventions, PALU performs localized forgetting along both temporal and lexical dimensions: it selectively suppresses sensitive prefixes to disrupt causal generation chains and flattens top-k logits within critical subspaces. This approach is guided by a local entropy maximization objective, enabling precise, prefix-aware removal of sensitive information. Experimental results demonstrate that PALU effectively eliminates targeted content while significantly outperforming state-of-the-art baselines and better preserving general task performance.

Technology Category

Application Category

📝 Abstract

Machine unlearning aims to forget sensitive knowledge from Large Language Models (LLMs) while maintaining general utility. However, existing approaches typically treat all tokens in a response indiscriminately and enforce uncertainty over the entire vocabulary. This global treatment results in unnecessary utility degradation and extends optimization to content-agnostic regions. To address these limitations, we propose PALU (Prefix-Aware Localized Unlearning), a framework driven by a local entropy maximization objective across both temporal and vocabulary dimensions. PALU reveals that (i) suppressing the sensitive prefix alone is sufficient to sever the causal generation link, and (ii) flattening only the top-$k$ logits is adequate to maximize uncertainty in the critical subspace. These findings allow PALU to avoid redundant optimization across the full vocabulary and parameter space while minimizing collateral damage to general model performance. Extensive experiments validate that PALU achieves superior forgetting efficacy and utility preservation compared to state-of-the-art baselines.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

large language models

utility degradation

sensitive knowledge

entropy maximization

Innovation

Methods, ideas, or system contributions that make the work stand out.

localized unlearning

prefix-aware

entropy maximization