Distinguishing Ignorance from Error in LLM Hallucinations

📅 2024-10-29

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Large language models (LLMs) exhibit heterogeneous hallucinations, yet prior work lacks a principled taxonomy distinguishing their root causes. Method: This paper introduces the first systematic dichotomy: hallucinations arising from genuine knowledge gaps (HK−) versus those stemming from erroneous activation of existing knowledge (HK+). Through multi-model, cross-dataset experiments—augmented by knowledge probing and human-annotated causal attribution—we empirically establish HK+ as pervasive and highly model-specific. Building on this insight, we propose a novel paradigm for constructing model-specific hallucination datasets and train a lightweight detector to significantly improve fine-grained hallucination classification and detection accuracy. Contribution/Results: We release the first open-source toolkit supporting HK−/HK+ fine-grained annotation, enabling precise hallucination analysis and targeted mitigation. This work establishes a new diagnostic and intervention framework for LLM hallucinations grounded in causal mechanism disentanglement.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are susceptible to hallucinations -- factually incorrect outputs -- leading to a large body of work on detecting and mitigating such cases. We argue that it is important to distinguish between two types of hallucinations: ones where the model does not hold the correct answer in its parameters, which we term HK-, and ones where the model answers incorrectly despite having the required knowledge, termed HK+. We first find that HK+ hallucinations are prevalent and occur across models and datasets. Then, we demonstrate that distinguishing between these two cases is beneficial for mitigating hallucinations. Importantly, we show that different models hallucinate on different examples, which motivates constructing model-specific hallucination datasets for training detectors. Overall, our findings draw attention to classifying types of hallucinations and provide means to handle them more effectively. The code is available at https://github.com/technion-cs-nlp/hallucination-mitigation .

Problem

Research questions and friction points this paper is trying to address.

Distinguish HK- and HK+ hallucinations in LLMs

Mitigate hallucinations by classifying their types

Construct model-specific datasets for hallucination detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Classify LLM hallucination types

Construct model-specific datasets

Differentiate knowledge-based errors

🔎 Similar Papers

No similar papers found.