Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning?

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This work addresses selective knowledge forgetting in large language models (LLMs), proposing a unified “forgetting-as-editing” perspective—framing forgetting as a special case of knowledge editing wherein target knowledge is replaced with refusal responses or null outputs. We systematically evaluate mainstream editing methods—including ROME, MEMIT, GRACE, WISE, and AlphaEdit—as forgetting baselines, and introduce two enhancements: a self-improving mechanism via in-context learning and a query-merging strategy. These significantly improve refusal alignment and enable long-sequence forgetting. Experiments show that WISE and AlphaEdit outperform existing dedicated forgetting methods on pretrained knowledge erasure, generating human-preferred refusal responses. With query merging, ROME and MEMIT achieve—for the first time—effective forgetting on long samples. Our approach establishes a new paradigm for LLM可控 forgetting: efficient, general-purpose, and scalable.

Technology Category

Application Category

📝 Abstract

Large language Model (LLM) unlearning, i.e., selectively removing information from LLMs, is vital for responsible model deployment. Differently, LLM knowledge editing aims to modify LLM knowledge instead of removing it. Though editing and unlearning seem to be two distinct tasks, we find there is a tight connection between them. In this paper, we conceptualize unlearning as a special case of editing where information is modified to a refusal or"empty set"$emptyset$ response, signifying its removal. This paper thus investigates if knowledge editing techniques are strong baselines for LLM unlearning. We evaluate state-of-the-art (SOTA) editing methods (e.g., ROME, MEMIT, GRACE, WISE, and AlphaEdit) against existing unlearning approaches on pretrained and finetuned knowledge. Results show certain editing methods, notably WISE and AlphaEdit, are effective unlearning baselines, especially for pretrained knowledge, and excel in generating human-aligned refusal answers. To better adapt editing methods for unlearning applications, we propose practical recipes including self-improvement and query merging. The former leverages the LLM's own in-context learning ability to craft a more human-aligned unlearning target, and the latter enables ROME and MEMIT to perform well in unlearning longer sample sequences. We advocate for the unlearning community to adopt SOTA editing methods as baselines and explore unlearning from an editing perspective for more holistic LLM memory control.

Problem

Research questions and friction points this paper is trying to address.

Investigates if knowledge editing methods can serve as strong baselines for LLM unlearning

Evaluates SOTA editing techniques for effectiveness in unlearning pretrained and finetuned knowledge

Proposes practical adaptations to enhance editing methods for unlearning applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conceptualizes unlearning as knowledge editing

Evaluates SOTA editing methods for unlearning

Proposes self-improvement and query merging techniques

🔎 Similar Papers

No similar papers found.