๐ค AI Summary
Large language models may inadvertently memorize sensitive or copyrighted content, posing privacy and legal risks. Existing unlearning methods typically rely on user-provided forgetting datasets, which are difficult to audit and risk secondary data leakage. To address these challenges, this work proposes MAGE, a framework that requires only a few user-provided anchor points to detect model memorization, construct a weighted local memory graph, and automatically generate targeted supervision signals for unlearningโwithout accessing the original training corpus. MAGE introduces the first corpus-free unlearning mechanism based on memory graphs, is model-agnostic, and compatible with mainstream unlearning algorithms. Experiments show that MAGE achieves forgetting performance comparable to external reference methods on the TOFU and RWKU benchmarks while effectively preserving overall model utility.
๐ Abstract
Large language models (LLMs) may memorize sensitive or copyrighted content, raising significant privacy and legal concerns. While machine unlearning has emerged as a potential remedy, prevailing paradigms rely on user-provided forget sets, making unlearning requests difficult to audit and exposing systems to secondary leakage and malicious abuse. We propose MAGE, a Memory-grAph Guided Erasure framework for user-minimized, corpus-free unlearning. Given only a lightweight user anchor that identifies a target entity, MAGE probes the target LLM to recover target-related memorization, organizes it into a weighted local memory graph, and synthesizes scoped supervision for unlearning. MAGE is model-agnostic, can be plugged into standard unlearning methods, and requires no access to the original training corpus. Experiments on two benchmarks, TOFU and RWKU, demonstrate that MAGE's self-generated supervision achieves effective unlearning performance comparable to supervision generated with external reference, while preserving overall utility. These results support a practical and auditable unlearning workflow driven by minimal anchors rather than user-supplied forget corpora.