🤖 AI Summary
Retrieval-augmented generation (RAG) systems remain prone to hallucination even when provided with correct and sufficient external context, and existing hallucination detection methods suffer from poor generalizability due to heavy reliance on extensive hyperparameter tuning.
Method: We propose a hyperparameter-agnostic hallucination detection framework that jointly models external context utilization—quantified via distributional distance between retrieved and generated content—and internal knowledge reliance—captured by the evolutionary dynamics of token predictions across Transformer layers—augmented with a statistical validation mechanism for enhanced robustness and interpretability.
Contribution/Results: Evaluated on multiple RAG hallucination benchmarks and four open-source large language models, our method achieves up to a 13% improvement in AUROC over prior approaches and demonstrates remarkable robustness under degraded retrieval quality. It offers an efficient, lightweight, and plug-and-play solution for trustworthy RAG deployment.
📝 Abstract
Retrieval-Augmented Generation (RAG) aims to mitigate hallucinations in large language models (LLMs) by grounding responses in retrieved documents. Yet, RAG-based LLMs still hallucinate even when provided with correct and sufficient context. A growing line of work suggests that this stems from an imbalance between how models use external context and their internal knowledge, and several approaches have attempted to quantify these signals for hallucination detection. However, existing methods require extensive hyperparameter tuning, limiting their generalizability. We propose LUMINA, a novel framework that detects hallucinations in RAG systems through context-knowledge signals: external context utilization is quantified via distributional distance, while internal knowledge utilization is measured by tracking how predicted tokens evolve across transformer layers. We further introduce a framework for statistically validating these measurements. Experiments on common RAG hallucination benchmarks and four open-source LLMs show that LUMINA achieves consistently high AUROC and AUPRC scores, outperforming prior utilization-based methods by up to +13% AUROC on HalluRAG. Moreover, LUMINA remains robust under relaxed assumptions about retrieval quality and model matching, offering both effectiveness and practicality.