HAD: HAllucination Detection Language Models Based on a Comprehensive Hallucination Taxonomy

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the pervasive hallucination problem in large language model (LLM) outputs, this paper introduces the first fine-grained, multi-task classification taxonomy covering 11 hallucination types and proposes HAD—a unified framework enabling integrated hallucination detection, erroneous span localization, and content correction. Methodologically, HAD is trained on ~90K synthetically generated samples via instruction tuning and multi-task learning, significantly enhancing cross-domain generalization. Key contributions include: (1) a systematic, principled hallucination taxonomy; (2) an end-to-end interpretable HAD model; and (3) HADTest—a high-quality, human-annotated evaluation benchmark. Extensive experiments demonstrate state-of-the-art performance on major hallucination benchmarks—including HaluEval, FactCHD, and FaithBench—validating HAD’s robustness, generalizability, and effectiveness across diverse domains and hallucination patterns.

Technology Category

Application Category

📝 Abstract
The increasing reliance on natural language generation (NLG) models, particularly large language models, has raised concerns about the reliability and accuracy of their outputs. A key challenge is hallucination, where models produce plausible but incorrect information. As a result, hallucination detection has become a critical task. In this work, we introduce a comprehensive hallucination taxonomy with 11 categories across various NLG tasks and propose the HAllucination Detection (HAD) models https://github.com/pku0xff/HAD, which integrate hallucination detection, span-level identification, and correction into a single inference process. Trained on an elaborate synthetic dataset of about 90K samples, our HAD models are versatile and can be applied to various NLG tasks. We also carefully annotate a test set for hallucination detection, called HADTest, which contains 2,248 samples. Evaluations on in-domain and out-of-domain test sets show that our HAD models generally outperform the existing baselines, achieving state-of-the-art results on HaluEval, FactCHD, and FaithBench, confirming their robustness and versatility.
Problem

Research questions and friction points this paper is trying to address.

Detect hallucinations in natural language generation models
Identify and correct incorrect information spans in text
Improve reliability across various NLG tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates detection, identification, and correction in one process
Trained on 90K synthetic samples for versatility
Outperforms baselines on multiple hallucination benchmarks
🔎 Similar Papers
No similar papers found.
F
Fan Xu
Wangxuan Institute of Computer Technology, Peking University
X
Xinyu Hu
Wangxuan Institute of Computer Technology, Peking University
Z
Zhenghan Yu
Wangxuan Institute of Computer Technology, Peking University
L
Li Lin
Wangxuan Institute of Computer Technology, Peking University
X
Xu Zhang
Wangxuan Institute of Computer Technology, Peking University
Y
Yang Zhang
Alibaba Group
W
Wei Zhou
Alibaba Group
Jinjie Gu
Jinjie Gu
ant group
机器学习,推荐
Xiaojun Wan
Xiaojun Wan
Peking University
Natural Language ProcessingText MiningArtificial Intelligence