🤖 AI Summary
Hallucinations in large language models (LLMs) pose serious threats to the safety and reliability of downstream applications. To address this, we introduce the first open-source Python toolkit for LLM hallucination detection. Our core method is a response-level confidence scoring system grounded in uncertainty quantification: it integrates multiple state-of-the-art uncertainty estimation techniques—including logit entropy, sampling variance, and calibration-aware confidence—to produce interpretable, normalized confidence scores in the [0,1] range. The toolkit is designed for plug-and-play deployment, modular extensibility, and seamless integration with mainstream LLM frameworks (e.g., Hugging Face Transformers, vLLM). Extensive experiments across multiple benchmark datasets demonstrate that our approach significantly improves hallucination detection accuracy, achieving an average +12.3% F1-score gain over baseline methods. This advancement enhances both the trustworthiness of generated content and the operational safety of LLM deployments.
📝 Abstract
Hallucinations, defined as instances where Large Language Models (LLMs) generate false or misleading content, pose a significant challenge that impacts the safety and trust of downstream applications. We introduce UQLM, a Python package for LLM hallucination detection using state-of-the-art uncertainty quantification (UQ) techniques. This toolkit offers a suite of UQ-based scorers that compute response-level confidence scores ranging from 0 to 1. This library provides an off-the-shelf solution for UQ-based hallucination detection that can be easily integrated to enhance the reliability of LLM outputs.