AutoHall: Automated Hallucination Dataset Generation for Large Language Models

📅 2023-09-30

🏛️ arXiv.org

📈 Citations: 12

✨ Influential: 2

career value

223K/year

🤖 AI Summary

Hallucination detection in large language models (LLMs) faces key bottlenecks: high annotation costs, dataset model specificity, and reliance on white-box access or supervised signals. Method: We propose a zero-resource, black-box hallucination detection paradigm. Our approach introduces the first automated framework for constructing hallucination datasets from fact-checking corpora, integrating prompt-engineering–driven self-consistency verification, black-box response analysis, and cross-model hallucination pattern comparison—requiring neither human annotation nor internal model access. Contribution/Results: Experiments demonstrate significant improvements over state-of-the-art baselines across major open- and closed-source LLMs. Crucially, our method systematically uncovers structural differences across models in hallucination types and prevalence—revealing previously uncharacterized variation. This enables scalable, model-agnostic evaluation of LLM reliability, advancing trustworthy AI assessment.

📝 Abstract

While Large language models (LLMs) have garnered widespread applications across various domains due to their powerful language understanding and generation capabilities, the detection of non-factual or hallucinatory content generated by LLMs remains scarce. Currently, one significant challenge in hallucination detection is the laborious task of time-consuming and expensive manual annotation of the hallucinatory generation. To address this issue, this paper first introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall. Furthermore, we propose a zero-resource and black-box hallucination detection method based on self-contradiction. We conduct experiments towards prevalent open-/closed-source LLMs, achieving superior hallucination detection performance compared to extant baselines. Moreover, our experiments reveal variations in hallucination proportions and types among different models.

Problem

Research questions and friction points this paper is trying to address.

Automatically generates model-specific hallucination datasets for LLMs

Addresses costly manual annotation for hallucination detection

Introduces zero-resource black-box method to detect LLM hallucinations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated generation of model-specific hallucination datasets

Zero-resource black-box detection using self-contradiction analysis

Leveraging existing fact-checking datasets to reduce annotation costs

🔎 Similar Papers

No similar papers found.