CoME: An Unlearning-based Approach to Conflict-free Model Editing

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Outdated knowledge固化 during large language model (LLM) pretraining causes knowledge conflicts during model editing, severely undermining update accuracy. To address this, we propose the first selective-forgetting-based, conflict-free model editing framework—introducing model unlearning mechanisms systematically into the editing pipeline for the first time—to decouple knowledge updating from language capability retention. Our method integrates gradient-controlled parameter-level forgetting, counterfactual supervision signals, and structured evaluation on Counterfact and ZsRE benchmarks. Experiments on GPT-J and LLaMA-3 demonstrate a 12.7% improvement in editing accuracy, significantly enhanced generalization consistency across unseen facts, and no degradation in generation quality. The core contribution lies in mitigating knowledge conflicts at their root, establishing a novel paradigm for reliable, interpretable, and controllable knowledge updates in LLMs.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) often retain outdated or incorrect information from pre-training, which undermines their reliability. While model editing methods have been developed to address such errors without full re-training, they frequently suffer from knowledge conflicts, where outdated information interferes with new knowledge. In this work, we propose Conflict-free Model Editing (CoME), a novel framework that enhances the accuracy of knowledge updates in LLMs by selectively removing outdated knowledge. CoME leverages unlearning to mitigate knowledge interference, allowing new information to be integrated without compromising relevant linguistic features. Through experiments on GPT-J and LLaMA-3 using Counterfact and ZsRE datasets, we demonstrate that CoME improves both editing accuracy and model reliability when applied to existing editing methods. Our results highlight that the targeted removal of outdated knowledge is crucial for enhancing model editing effectiveness and maintaining the model's generative performance.

Problem

Research questions and friction points this paper is trying to address.

Addresses outdated information in LLMs

Reduces knowledge conflicts during model edits

Enhances accuracy and reliability of model updates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unlearning-based model editing

Conflict-free knowledge integration

Selective outdated knowledge removal

🔎 Similar Papers

Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models