From Fact Overwriting to Knowledge Evolution: Causal Editing via On-Policy Self-Distillation

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work proposes a novel causal editing paradigm that overcomes the structural limitations of conventional knowledge editing approaches, which treat large language models as static databases and inject isolated facts in ways that disrupt pretrained internal logic, leading to cognitive dissonance and self-contradiction. By explicitly modeling knowledge evolution through causal narratives, the method internalizes causal dynamics into model parameters via causal bootstrapping, asymmetric online policy self-distillation, and parameterized memory consolidation. Evaluated on LLaMA-3.1 and Qwen-2.5, the approach reduces self-contradiction rates to 1.8% while preserving 83.5% accuracy on multi-hop reasoning tasks, thereby shifting the paradigm from discrete fact injection to coherent, causally grounded knowledge evolution.

📝 Abstract

While Knowledge Editing (KE) enables efficient updates, its dominant Static Fact Overwriting paradigm treats LLMs as discrete databases, forcibly injecting isolated facts. Fracturing pre-trained logical topologies, this triggers Epistemic Dissonance -- a pathology where un-evolved legacy priors force the model to explicitly negate the injected update. Idealized interventions reveal that this is an inherent structural flaw rather than mere algorithmic noise, with a zero-distortion proxy yielding a catastrophic 95.6% self-refutation rate. Given the causally driven nature of real-world knowledge, grounding updates in explicit causal narratives effectively collapses this conflict rate to just 6.6%, underscoring the imperative for a paradigm shift toward Causal Editing. To internalize this evolution, we propose CODE (Causal On-policy Distillation for Editing). By coupling causal bootstrapping with asymmetric on-policy distillation, CODE engraves causal transition logic directly into parametric memory. Experiments on LLaMA-3.1 and Qwen-2.5 show CODE drastically suppresses self-refutation to 1.8% while securing robust multi-hop accuracy (up to 83.5%), seamlessly transforming discrete fact injection into coherent knowledge evolution. Code is available at https://github.com/CrashBugger/CODE.

Problem

Research questions and friction points this paper is trying to address.

Knowledge Editing

Epistemic Dissonance

Causal Editing

Static Fact Overwriting

Self-Refutation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Editing

On-Policy Self-Distillation

Knowledge Evolution