🤖 AI Summary
Outdated knowledge固化 during large language model (LLM) pretraining causes knowledge conflicts during model editing, severely undermining update accuracy. To address this, we propose the first selective-forgetting-based, conflict-free model editing framework—introducing model unlearning mechanisms systematically into the editing pipeline for the first time—to decouple knowledge updating from language capability retention. Our method integrates gradient-controlled parameter-level forgetting, counterfactual supervision signals, and structured evaluation on Counterfact and ZsRE benchmarks. Experiments on GPT-J and LLaMA-3 demonstrate a 12.7% improvement in editing accuracy, significantly enhanced generalization consistency across unseen facts, and no degradation in generation quality. The core contribution lies in mitigating knowledge conflicts at their root, establishing a novel paradigm for reliable, interpretable, and controllable knowledge updates in LLMs.
📝 Abstract
Large language models (LLMs) often retain outdated or incorrect information from pre-training, which undermines their reliability. While model editing methods have been developed to address such errors without full re-training, they frequently suffer from knowledge conflicts, where outdated information interferes with new knowledge. In this work, we propose Conflict-free Model Editing (CoME), a novel framework that enhances the accuracy of knowledge updates in LLMs by selectively removing outdated knowledge. CoME leverages unlearning to mitigate knowledge interference, allowing new information to be integrated without compromising relevant linguistic features. Through experiments on GPT-J and LLaMA-3 using Counterfact and ZsRE datasets, we demonstrate that CoME improves both editing accuracy and model reliability when applied to existing editing methods. Our results highlight that the targeted removal of outdated knowledge is crucial for enhancing model editing effectiveness and maintaining the model's generative performance.