Can Knowledge Editing Really Correct Hallucinations?

📅 2024-10-21

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This study addresses whether knowledge editing can effectively correct hallucinations in large language models (LLMs). Method: We introduce HalluEditBench—the first benchmark for real-world hallucination correction—comprising 6,000+ human-verified hallucination instances across nine domains and 26 topics. Crucially, we rigorously ensure that models generate hallucinated outputs *before* editing. We propose a five-dimensional evaluation framework assessing effectiveness, generalization, transferability, locality, and robustness, supported by automated hallucination detection, domain-enhanced QA pair construction, multi-metric quantification, and large-scale controlled knowledge perturbation testing. Contribution/Results: Empirical evaluation reveals that state-of-the-art editing methods fully correct only 37.2% of hallucinations on average; generalization and robustness remain severely limited. Our benchmark provides a reproducible, scalable, and empirically grounded foundation for hallucination correction research, systematically exposing the current capabilities and limitations of knowledge editing approaches.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) suffer from hallucinations, referring to the non-factual information in generated content, despite their superior capacities across tasks. Meanwhile, knowledge editing has been developed as a new popular paradigm to correct erroneous factual knowledge encoded in LLMs with the advantage of avoiding retraining from scratch. However, a common issue of existing evaluation datasets for knowledge editing is that they do not ensure that LLMs actually generate hallucinated answers to the evaluation questions before editing. When LLMs are evaluated on such datasets after being edited by different techniques, it is hard to directly adopt the performance to assess the effectiveness of different knowledge editing methods in correcting hallucinations. Thus, the fundamental question remains insufficiently validated: Can knowledge editing really correct hallucinations in LLMs? We proposed HalluEditBench to holistically benchmark knowledge editing methods in correcting real-world hallucinations. First, we rigorously construct a massive hallucination dataset with 9 domains, 26 topics and more than 6,000 hallucinations. Then, we assess the performance of knowledge editing methods in a holistic way on five dimensions including Efficacy, Generalization, Portability, Locality, and Robustness. Through HalluEditBench, we have provided new insights into the potentials and limitations of different knowledge editing methods in correcting hallucinations, which could inspire future improvements and facilitate progress in the field of knowledge editing.

Problem

Research questions and friction points this paper is trying to address.

Assessing knowledge editing effectiveness in correcting LLM hallucinations

Evaluating knowledge editing methods on real-world hallucination datasets

Benchmarking knowledge editing across efficacy, generalization, and robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed HalluEditBench for benchmarking knowledge editing

Constructed massive hallucination dataset across 9 domains

Assessed knowledge editing on five key performance dimensions

🔎 Similar Papers

No similar papers found.