From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Large language models may inadvertently memorize sensitive or copyrighted content, posing privacy and legal risks. Existing unlearning methods typically rely on user-provided forgetting datasets, which are difficult to audit and risk secondary data leakage. To address these challenges, this work proposes MAGE, a framework that requires only a few user-provided anchor points to detect model memorization, construct a weighted local memory graph, and automatically generate targeted supervision signals for unlearning—without accessing the original training corpus. MAGE introduces the first corpus-free unlearning mechanism based on memory graphs, is model-agnostic, and compatible with mainstream unlearning algorithms. Experiments show that MAGE achieves forgetting performance comparable to external reference methods on the TOFU and RWKU benchmarks while effectively preserving overall model utility.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) may memorize sensitive or copyrighted content, raising significant privacy and legal concerns. While machine unlearning has emerged as a potential remedy, prevailing paradigms rely on user-provided forget sets, making unlearning requests difficult to audit and exposing systems to secondary leakage and malicious abuse. We propose MAGE, a Memory-grAph Guided Erasure framework for user-minimized, corpus-free unlearning. Given only a lightweight user anchor that identifies a target entity, MAGE probes the target LLM to recover target-related memorization, organizes it into a weighted local memory graph, and synthesizes scoped supervision for unlearning. MAGE is model-agnostic, can be plugged into standard unlearning methods, and requires no access to the original training corpus. Experiments on two benchmarks, TOFU and RWKU, demonstrate that MAGE's self-generated supervision achieves effective unlearning performance comparable to supervision generated with external reference, while preserving overall utility. These results support a practical and auditable unlearning workflow driven by minimal anchors rather than user-supplied forget corpora.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

large language models

privacy

forget set

Innovation

Methods, ideas, or system contributions that make the work stand out.

machine unlearning

memory graph

corpus-free

model-agnostic