FaithLens: Detecting and Explaining Faithfulness Hallucination

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Large language models (LLMs) suffer from faithfulness hallucinations in retrieval-augmented generation and summarization tasks. Method: This paper proposes a lightweight joint prediction-and-explanation framework, introducing the first regularized reinforcement learning paradigm that jointly optimizes quality rewards and explanation constraints; it employs LLM-driven data synthesis, multi-dimensional filtering, and supervised fine-tuning for cold-start initialization. Contribution/Results: We present the first end-to-end model achieving both high-accuracy binary detection and natural-language explanation of hallucinations. Evaluated on 12 heterogeneous benchmarks, it significantly outperforms GPT-4.1 and o3. With only 8B parameters, it achieves state-of-the-art performance—delivering high accuracy, high-quality explanations, and low inference overhead—thereby offering an efficient solution for trustworthy LLM deployment.

Technology Category

Application Category

📝 Abstract

Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-4.1 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Detects faithfulness hallucination in LLM outputs

Provides binary predictions and explanations for trustworthiness

Outperforms advanced models like GPT-4 across diverse tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning with synthesized and filtered training data

Rule-based reinforcement learning for prediction and explanation

Cost-efficient 8B-parameter model outperforming advanced LLMs

🔎 Similar Papers

No similar papers found.