🤖 AI Summary
Current LLM knowledge editing methods predominantly focus on isolated fact updates, neglecting cascading effects on semantically related knowledge—leading to residual inferability of edited facts and degradation of contextual coherence. To address this, we propose *deep editing*, a novel paradigm introducing *thought-based knowledge graphs* (Thought-based KGs) that systematically model how edits propagate through implicit relational facts and affect contextual consistency via reasoning chains. Building upon this, we construct KnowGIC—the first causal chain-based query benchmark for evaluating knowledge editing—and conduct a multidimensional assessment of five state-of-the-art methods (e.g., ROME, MEMIT). Our results reveal a fundamental trade-off across all approaches between suppressing indirect leakage and preserving related knowledge, manifesting as either persistent knowledge leakage or catastrophic forgetting. This work establishes a theoretical framework and empirical foundation for trustworthy, interpretable LLM knowledge editing.
📝 Abstract
Model editing has become an important tool for addressing privacy, bias, and misinformation in large language models (LLMs) by enabling updates to knowledge without the need for retraining from scratch. However, existing editing techniques often target isolated facts, ignoring ripple effects on related knowledge, allowing edited facts to remain deducible and compromising broader contextual integrity. For example, changing Harry Potter's school from Hogwarts to Ilvermorny requires reassigning his house from Gryffindor to a suitable alternative while preserving Gryffindor's relationship with Hogwarts. In this work, we present a new model-editing setting, deep editing, to show: (1) how editing techniques fail to handle connected facts, evaluating how original knowledge sneaks through unchanged causal links, and (2) their impact on broader contextual knowledge. We introduce ThinkEval, a framework to systematically evaluate model- editing techniques by building model-specific knowledge graphs to analyze pre- and post-edit effects on fact persistence and catastrophic forgetting. We present KnowGIC, a benchmark created with ThinkEval, consisting of sequentially linked queries to measure these effects. We evaluate five editing techniques: AlphaEdit, RECT, ROME, MEMIT, and PRUNE across multiple LLMs. We find that these techniques struggle to balance indirect fact suppression with the preservation of related knowledge. Our dataset is available at: https://anonymous.4open.science/r/KnowGIC.