RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the problem of hallucination in large language models within retrieval-augmented generation (RAG) systems, particularly in closed-domain settings. The authors propose a lightweight detection head jointly optimized with the language model, which innovatively leverages internal-state-based token-level hallucination signals for direct fine-tuning. To support this approach, they construct RAGognize, the first naturally occurring hallucination dataset annotated at the token level. By integrating the detection head, designing a joint training objective, and enhancing the separability of internal representations, the method achieves state-of-the-art performance in token-level hallucination detection across multiple benchmarks. Notably, it significantly reduces hallucination rates while preserving the quality and relevance of generated text.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) is widely used to augment the input to Large Language Models (LLMs) with external information, such as recent or domain-specific knowledge. Nonetheless, current models still produce closed-domain hallucinations and generate content that is unsupported by the retrieved context. Current detection approaches typically treat hallucination as a post-hoc problem, relying on black-box consistency checks or probes over frozen internal representations. In this work, we demonstrate that hallucination detection based on internal state representation can also serve as a direct training signal. We introduce RAGognize, a dataset of naturally occurring closed-domain hallucinations with token-level annotations, and RAGognizer, a hallucination-aware fine-tuning approach that integrates a lightweight detection head into an LLM, allowing for the joint optimization of language modeling and hallucination detection. This joint objective forces the model to improve the separability of its internal states regarding hallucinations while simultaneously learning to generate well-formed and meaningful responses. Across multiple benchmarks, RAGognizer achieves state-of-the-art token-level hallucination detection while substantially reducing hallucination rates during generation, without degrading language quality or relevance.

Problem

Research questions and friction points this paper is trying to address.

hallucination

Retrieval-Augmented Generation

closed-domain hallucinations

token-level hallucination detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

hallucination-aware fine-tuning

detection head integration

retrieval-augmented generation

token-level hallucination detection