AutoHall: Automated Hallucination Dataset Generation for Large Language Models

📅 2023-09-30
🏛️ arXiv.org
📈 Citations: 12
Influential: 2
📄 PDF
🤖 AI Summary
Hallucination detection in large language models (LLMs) faces key bottlenecks: high annotation costs, dataset model specificity, and reliance on white-box access or supervised signals. Method: We propose a zero-resource, black-box hallucination detection paradigm. Our approach introduces the first automated framework for constructing hallucination datasets from fact-checking corpora, integrating prompt-engineering–driven self-consistency verification, black-box response analysis, and cross-model hallucination pattern comparison—requiring neither human annotation nor internal model access. Contribution/Results: Experiments demonstrate significant improvements over state-of-the-art baselines across major open- and closed-source LLMs. Crucially, our method systematically uncovers structural differences across models in hallucination types and prevalence—revealing previously uncharacterized variation. This enables scalable, model-agnostic evaluation of LLM reliability, advancing trustworthy AI assessment.
📝 Abstract
While Large language models (LLMs) have garnered widespread applications across various domains due to their powerful language understanding and generation capabilities, the detection of non-factual or hallucinatory content generated by LLMs remains scarce. Currently, one significant challenge in hallucination detection is the laborious task of time-consuming and expensive manual annotation of the hallucinatory generation. To address this issue, this paper first introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall. Furthermore, we propose a zero-resource and black-box hallucination detection method based on self-contradiction. We conduct experiments towards prevalent open-/closed-source LLMs, achieving superior hallucination detection performance compared to extant baselines. Moreover, our experiments reveal variations in hallucination proportions and types among different models.
Problem

Research questions and friction points this paper is trying to address.

Automatically generates model-specific hallucination datasets for LLMs
Addresses costly manual annotation for hallucination detection
Introduces zero-resource black-box method to detect LLM hallucinations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated generation of model-specific hallucination datasets
Zero-resource black-box detection using self-contradiction analysis
Leveraging existing fact-checking datasets to reduce annotation costs
🔎 Similar Papers
No similar papers found.
Zouying Cao
Zouying Cao
Shanghai Jiao Tong University
Natural Language ProcessingLarge Language ModelsReinforcement Learning
Yifei Yang
Yifei Yang
Shanghai Jiao Tong University
Natural Language Processing
H
Hai Zhao
AGI Institute, School of Computer Science, Shanghai Jiao Tong University, Shanghai 200240, China, and also with the Shanghai Key Laboratory of Trusted Data Circulation and Governance in Web3, Shanghai 200240, China