Disentangling Knowledge Representations for Large Language Model Editing

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing knowledge editing methods for large language models often inadvertently alter fine-grained, unrelated facts sharing the same subject but differing in relation or object—a problem rooted in semantic entanglement caused by multi-attribute coupling within subject representations. To address this, we propose DiKE, a knowledge representation disentanglement framework that, for the first time, decomposes subject representations into target-relevant and irrelevant components. We further design a closed-form, rank-one parameter update strategy grounded in matrix theory, enabling efficient and minimally invasive editing. We introduce FINE-KED, a fine-grained evaluation benchmark for knowledge editing. Experiments across multiple models demonstrate that DiKE significantly improves preservation of unrelated knowledge, boosts editing F1 by 12.6%, reduces erroneous edits by 47%, and maintains high editing accuracy and generalization capability.

Technology Category

Application Category

📝 Abstract

Knowledge Editing has emerged as a promising solution for efficiently updating embedded knowledge in large language models (LLMs). While existing approaches demonstrate effectiveness in integrating new knowledge and preserving the original capabilities of LLMs, they fail to maintain fine-grained irrelevant knowledge facts that share the same subject as edited knowledge but differ in relation and object. This challenge arises because subject representations inherently encode multiple attributes, causing the target and fine-grained irrelevant knowledge to become entangled in the representation space, and thus vulnerable to unintended alterations during editing. To address this, we propose DiKE, a novel approach that Disentangles Knowledge representations for LLM Editing (DiKE). DiKE consists of two key components: a Knowledge Representation Disentanglement (KRD) module that decomposes the subject representation into target-knowledgerelated and -unrelated components, and a Disentanglement-based Knowledge Edit (DKE) module that updates only the target-related component while explicitly preserving the unrelated one. We further derive a closed-form, rank-one parameter update based on matrix theory to enable efficient and minimally invasive edits. To rigorously evaluate fine-grained irrelevant knowledge preservation, we construct FINE-KED, a new benchmark comprising fine-grained irrelevant knowledge at different levels of relational similarity to the edited knowledge. Extensive experiments across multiple LLMs demonstrate that DiKE substantially improves fine-grained irrelevant knowledge preservation while maintaining competitive general editing performance.

Problem

Research questions and friction points this paper is trying to address.

Disentangling subject representations to prevent unintended knowledge alterations

Preserving fine-grained irrelevant knowledge during large language model editing

Developing efficient minimally invasive edits for targeted knowledge updates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangles knowledge representations for precise editing

Decomposes subject into target-related and unrelated components

Uses rank-one parameter update for efficient edits

🔎 Similar Papers

No similar papers found.