🤖 AI Summary
Hallucination detection in closed-source large language models (LLMs) faces three key challenges: absence of reference answers, difficulty in modeling query–response alignment, and lack of cross-domain benchmarks. Method: We propose the first reference-free, model-agnostic, lightweight hallucination detection framework. It jointly models inter-response consistency and multi-granularity query–response alignment, employs contrastive learning to train a binary classifier, and adopts a hybrid data construction paradigm combining synthetic data augmentation with human verification. Contributions/Results: (1) A novel dual-consistency modeling paradigm; (2) HalluCounterEval—the first large-scale, cross-domain, multi-source hallucination evaluation benchmark; (3) State-of-the-art performance across diverse domains, achieving >90% average detection confidence—significantly outperforming existing methods.
📝 Abstract
Response consistency-based, reference-free hallucination detection (RFHD) methods do not depend on internal model states, such as generation probabilities or gradients, which Grey-box models typically rely on but are inaccessible in closed-source LLMs. However, their inability to capture query-response alignment patterns often results in lower detection accuracy. Additionally, the lack of large-scale benchmark datasets spanning diverse domains remains a challenge, as most existing datasets are limited in size and scope. To this end, we propose HalluCounter, a novel reference-free hallucination detection method that utilizes both response-response and query-response consistency and alignment patterns. This enables the training of a classifier that detects hallucinations and provides a confidence score and an optimal response for user queries. Furthermore, we introduce HalluCounterEval, a benchmark dataset comprising both synthetically generated and human-curated samples across multiple domains. Our method outperforms state-of-the-art approaches by a significant margin, achieving over 90% average confidence in hallucination detection across datasets.