FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This paper addresses the problem of *unfaithful forgetting*—erroneous deletion (under- or over-forgetting) of knowledge in large language models (LLMs) due to spurious knowledge associations. We first formally define and quantify the *surface forgetting* phenomenon, wherein models appear to forget target facts but retain implicit, functionally equivalent knowledge. To rigorously evaluate forgetting fidelity, we introduce **FaithUn**, the first benchmark tailored to realistic question-answering scenarios. We further propose **KLUE**, a novel unlearning method that identifies knowledge-associated neurons via interpretability analysis, then performs fine-grained, context-aware local parameter updates guided by KL-divergence constraints and a knowledge-graph-informed evaluation protocol. Extensive evaluation on FaithUn reveals that mainstream unlearning methods suffer from severe unfaithfulness. KLUE achieves substantial improvements: +38.2% in forgetting accuracy and +41.5% in retained knowledge integrity, establishing a new standard and practical solution for trustworthy knowledge editing in LLMs.

Technology Category

Application Category

📝 Abstract

Various studies have attempted to remove sensitive or private knowledge from a language model to prevent its unauthorized exposure. However, prior studies have overlooked the complex and interconnected nature of knowledge, where related knowledge must be carefully examined. Specifically, they have failed to evaluate whether an unlearning method faithfully erases interconnected knowledge that should be removed, retaining knowledge that appears relevant but exists in a completely different context. To resolve this problem, we first define a new concept called superficial unlearning, which refers to the phenomenon where an unlearning method either fails to erase the interconnected knowledge it should remove or unintentionally erases irrelevant knowledge. Based on the definition, we introduce a new benchmark, FaithUn, to analyze and evaluate the faithfulness of unlearning in real-world knowledge QA settings. Furthermore, we propose a novel unlearning method, KLUE, which updates only knowledge-related neurons to achieve faithful unlearning. KLUE identifies knowledge neurons using an explainability method and updates only those neurons using selected unforgotten samples. Experimental results demonstrate that widely-used unlearning methods fail to ensure faithful unlearning, while our method shows significant effectiveness in real-world QA unlearning.

Problem

Research questions and friction points this paper is trying to address.

Faithful forgetting in language models

Interconnectedness of knowledge removal

Superficial unlearning prevention

Innovation

Methods, ideas, or system contributions that make the work stand out.

Defines superficial unlearning concept

Introduces FaithUn benchmark tool

Proposes KLUE unlearning method

🔎 Similar Papers

Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis