🤖 AI Summary
Large language models (LLMs) are prone to memorizing and leaking sensitive information during training; existing unlearning methods predominantly rely on model fine-tuning, suffering from poor robustness and limited scalability. This paper proposes CURE, a training-free, dynamic knowledge unlearning framework that operates entirely at inference time. CURE employs retrieval-augmented detection to identify potential leakage triggers, then applies a lightweight corrector and conditional rewriting mechanism to detect and safely rewrite sensitive outputs in real time. Its core innovation lies in decoupling unlearning into a three-stage pipeline—retrieval, verification, and rewriting—enabling scalable, continuous unlearning under diverse and evolving requests. Experiments demonstrate that CURE significantly reduces both direct and indirect leakage rates compared to state-of-the-art baselines, while preserving generation quality and general-purpose model capabilities.
📝 Abstract
Language models trained on web-scale corpora risk memorizing and exposing sensitive information, prompting the need for effective machine unlearning. Prior methods mainly focus on input queries to suppress sensitive outputs, yet this often fails to eliminate the underlying knowledge and limits scalability. To address this, we propose Corrective Unlearning with Retrieved Exclusions (CURE), a novel unlearning framework that verifies model outputs for leakage and revises them into safe responses. Specifically, CURE employs a lightweight corrector that is applied to the original model to verify whether outputs contain target knowledge and to rewrite them if any leakage is detected. To efficiently handle large-scale unlearning requests, CURE retrieves unlearning targets that are relevant to the initial response and provides them as in-context references to the corrector for detection and conditional revision. By leveraging this retrieval augmentation, the corrector can adapt to new unlearning requests without additional training. Extensive evaluations demonstrate that CURE substantially reduces information leakage, even from indirect queries where prior works fall short, while maintaining response quality and general utility. Moreover, it demonstrates robustness under continual unlearning scenarios, making it practical for real-world applications.