Learning to Reason for Hallucination Span Detection

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Addressing the challenges of fine-grained span-level hallucination localization and reward imbalance in large language model (LLM) hallucination detection, this paper proposes RL4HS—the first framework to introduce span-level rewards into hallucination detection. Methodologically, RL4HS integrates Chain-of-Thought (CoT) prompting for explicit reasoning-path modeling with class-aware policy optimization, built upon Group Relative Policy Optimization (GRPO) to construct an end-to-end reinforcement learning framework capable of multi-step, fine-grained identification of untrustworthy text spans. Compared to conventional binary classification and supervised fine-tuning approaches, RL4HS achieves significant improvements in both accuracy and robustness of hallucination span identification on the RAGTruth benchmark. Experimental results validate the effectiveness of explicit reasoning-guided reinforcement learning in modeling complex hallucination structures. This work establishes a novel paradigm for interpretable and localizable trustworthy generation.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) often generate hallucinations -- unsupported content that undermines reliability. While most prior works frame hallucination detection as a binary task, many real-world applications require identifying hallucinated spans, which is a multi-step decision making process. This naturally raises the question of whether explicit reasoning can help the complex task of detecting hallucination spans. To answer this question, we first evaluate pretrained models with and without Chain-of-Thought (CoT) reasoning, and show that CoT reasoning has the potential to generate at least one correct answer when sampled multiple times. Motivated by this, we propose RL4HS, a reinforcement learning framework that incentivizes reasoning with a span-level reward function. RL4HS builds on Group Relative Policy Optimization and introduces Class-Aware Policy Optimization to mitigate reward imbalance issue. Experiments on the RAGTruth benchmark (summarization, question answering, data-to-text) show that RL4HS surpasses pretrained reasoning models and supervised fine-tuning, demonstrating the necessity of reinforcement learning with span-level rewards for detecting hallucination spans.

Problem

Research questions and friction points this paper is trying to address.

Detecting specific hallucinated spans in LLM outputs

Improving hallucination detection through explicit reasoning processes

Addressing reward imbalance in reinforcement learning for span detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning framework for hallucination detection

Span-level reward function incentivizes reasoning process

Class-Aware Policy Optimization mitigates reward imbalance

🔎 Similar Papers

No similar papers found.