Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning?

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses selective knowledge forgetting in large language models (LLMs), proposing a unified “forgetting-as-editing” perspective—framing forgetting as a special case of knowledge editing wherein target knowledge is replaced with refusal responses or null outputs. We systematically evaluate mainstream editing methods—including ROME, MEMIT, GRACE, WISE, and AlphaEdit—as forgetting baselines, and introduce two enhancements: a self-improving mechanism via in-context learning and a query-merging strategy. These significantly improve refusal alignment and enable long-sequence forgetting. Experiments show that WISE and AlphaEdit outperform existing dedicated forgetting methods on pretrained knowledge erasure, generating human-preferred refusal responses. With query merging, ROME and MEMIT achieve—for the first time—effective forgetting on long samples. Our approach establishes a new paradigm for LLM可控 forgetting: efficient, general-purpose, and scalable.

Technology Category

Application Category

📝 Abstract
Large language Model (LLM) unlearning, i.e., selectively removing information from LLMs, is vital for responsible model deployment. Differently, LLM knowledge editing aims to modify LLM knowledge instead of removing it. Though editing and unlearning seem to be two distinct tasks, we find there is a tight connection between them. In this paper, we conceptualize unlearning as a special case of editing where information is modified to a refusal or"empty set"$emptyset$ response, signifying its removal. This paper thus investigates if knowledge editing techniques are strong baselines for LLM unlearning. We evaluate state-of-the-art (SOTA) editing methods (e.g., ROME, MEMIT, GRACE, WISE, and AlphaEdit) against existing unlearning approaches on pretrained and finetuned knowledge. Results show certain editing methods, notably WISE and AlphaEdit, are effective unlearning baselines, especially for pretrained knowledge, and excel in generating human-aligned refusal answers. To better adapt editing methods for unlearning applications, we propose practical recipes including self-improvement and query merging. The former leverages the LLM's own in-context learning ability to craft a more human-aligned unlearning target, and the latter enables ROME and MEMIT to perform well in unlearning longer sample sequences. We advocate for the unlearning community to adopt SOTA editing methods as baselines and explore unlearning from an editing perspective for more holistic LLM memory control.
Problem

Research questions and friction points this paper is trying to address.

Investigates if knowledge editing methods can serve as strong baselines for LLM unlearning
Evaluates SOTA editing techniques for effectiveness in unlearning pretrained and finetuned knowledge
Proposes practical adaptations to enhance editing methods for unlearning applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conceptualizes unlearning as knowledge editing
Evaluates SOTA editing methods for unlearning
Proposes self-improvement and query merging techniques
🔎 Similar Papers
No similar papers found.