🤖 AI Summary
Large language models (LLMs) exhibit heterogeneous hallucinations, yet prior work lacks a principled taxonomy distinguishing their root causes. Method: This paper introduces the first systematic dichotomy: hallucinations arising from genuine knowledge gaps (HK−) versus those stemming from erroneous activation of existing knowledge (HK+). Through multi-model, cross-dataset experiments—augmented by knowledge probing and human-annotated causal attribution—we empirically establish HK+ as pervasive and highly model-specific. Building on this insight, we propose a novel paradigm for constructing model-specific hallucination datasets and train a lightweight detector to significantly improve fine-grained hallucination classification and detection accuracy. Contribution/Results: We release the first open-source toolkit supporting HK−/HK+ fine-grained annotation, enabling precise hallucination analysis and targeted mitigation. This work establishes a new diagnostic and intervention framework for LLM hallucinations grounded in causal mechanism disentanglement.
📝 Abstract
Large language models (LLMs) are susceptible to hallucinations -- factually incorrect outputs -- leading to a large body of work on detecting and mitigating such cases. We argue that it is important to distinguish between two types of hallucinations: ones where the model does not hold the correct answer in its parameters, which we term HK-, and ones where the model answers incorrectly despite having the required knowledge, termed HK+. We first find that HK+ hallucinations are prevalent and occur across models and datasets. Then, we demonstrate that distinguishing between these two cases is beneficial for mitigating hallucinations. Importantly, we show that different models hallucinate on different examples, which motivates constructing model-specific hallucination datasets for training detectors. Overall, our findings draw attention to classifying types of hallucinations and provide means to handle them more effectively. The code is available at https://github.com/technion-cs-nlp/hallucination-mitigation .