🤖 AI Summary
This work addresses the problem of hallucination in large language models within retrieval-augmented generation (RAG) systems, particularly in closed-domain settings. The authors propose a lightweight detection head jointly optimized with the language model, which innovatively leverages internal-state-based token-level hallucination signals for direct fine-tuning. To support this approach, they construct RAGognize, the first naturally occurring hallucination dataset annotated at the token level. By integrating the detection head, designing a joint training objective, and enhancing the separability of internal representations, the method achieves state-of-the-art performance in token-level hallucination detection across multiple benchmarks. Notably, it significantly reduces hallucination rates while preserving the quality and relevance of generated text.
📝 Abstract
Retrieval-Augmented Generation (RAG) is widely used to augment the input to Large Language Models (LLMs) with external information, such as recent or domain-specific knowledge. Nonetheless, current models still produce closed-domain hallucinations and generate content that is unsupported by the retrieved context. Current detection approaches typically treat hallucination as a post-hoc problem, relying on black-box consistency checks or probes over frozen internal representations. In this work, we demonstrate that hallucination detection based on internal state representation can also serve as a direct training signal. We introduce RAGognize, a dataset of naturally occurring closed-domain hallucinations with token-level annotations, and RAGognizer, a hallucination-aware fine-tuning approach that integrates a lightweight detection head into an LLM, allowing for the joint optimization of language modeling and hallucination detection. This joint objective forces the model to improve the separability of its internal states regarding hallucinations while simultaneously learning to generate well-formed and meaningful responses. Across multiple benchmarks, RAGognizer achieves state-of-the-art token-level hallucination detection while substantially reducing hallucination rates during generation, without degrading language quality or relevance.