HalluCounter: Reference-free LLM Hallucination Detection in the Wild!

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Hallucination detection in closed-source large language models (LLMs) faces three key challenges: absence of reference answers, difficulty in modeling query–response alignment, and lack of cross-domain benchmarks. Method: We propose the first reference-free, model-agnostic, lightweight hallucination detection framework. It jointly models inter-response consistency and multi-granularity query–response alignment, employs contrastive learning to train a binary classifier, and adopts a hybrid data construction paradigm combining synthetic data augmentation with human verification. Contributions/Results: (1) A novel dual-consistency modeling paradigm; (2) HalluCounterEval—the first large-scale, cross-domain, multi-source hallucination evaluation benchmark; (3) State-of-the-art performance across diverse domains, achieving >90% average detection confidence—significantly outperforming existing methods.

Technology Category

Application Category

📝 Abstract

Response consistency-based, reference-free hallucination detection (RFHD) methods do not depend on internal model states, such as generation probabilities or gradients, which Grey-box models typically rely on but are inaccessible in closed-source LLMs. However, their inability to capture query-response alignment patterns often results in lower detection accuracy. Additionally, the lack of large-scale benchmark datasets spanning diverse domains remains a challenge, as most existing datasets are limited in size and scope. To this end, we propose HalluCounter, a novel reference-free hallucination detection method that utilizes both response-response and query-response consistency and alignment patterns. This enables the training of a classifier that detects hallucinations and provides a confidence score and an optimal response for user queries. Furthermore, we introduce HalluCounterEval, a benchmark dataset comprising both synthetically generated and human-curated samples across multiple domains. Our method outperforms state-of-the-art approaches by a significant margin, achieving over 90% average confidence in hallucination detection across datasets.

Problem

Research questions and friction points this paper is trying to address.

Detects hallucinations in LLMs without reference data

Addresses low accuracy in query-response alignment detection

Introduces a large-scale, diverse benchmark dataset

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes response-response and query-response consistency

Introduces HalluCounterEval benchmark dataset

Achieves over 90% confidence in hallucination detection

🔎 Similar Papers

AutoHall: Automated Hallucination Dataset Generation for Large Language Models