Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Large reasoning models (LRMs) generate explicit multi-step reasoning traces to enhance interpretability, yet these traces often harbor implicit hallucinations—where final answers are correct but intermediate steps exhibit redundancy, inconsistency, or logical gaps—making them difficult to detect. Existing hallucination detection methods focus primarily on answer-level uncertainty, neglecting the intrinsic reliability of the reasoning chain itself. Method: We propose the first joint evaluation paradigm assessing both *answer correctness* and *reasoning consistency*, grounded in four diagnostic dimensions: (i) cross-sample reasoning consistency, (ii) answer entropy-based uncertainty, (iii) semantic alignment between reasoning and answer, and (iv) intra-reasoning coherence. Based on these, we design RACE—a unified assessment framework integrating step extraction, semantic similarity computation, information-theoretic entropy modeling, and logical consistency analysis. Contribution/Results: RACE achieves significant improvements over state-of-the-art methods across multiple benchmarks and LLMs, demonstrating strong robustness and cross-model generalization capability.

Technology Category

Application Category

📝 Abstract

Large Reasoning Models (LRMs) extend large language models with explicit, multi-step reasoning traces to enhance transparency and performance on complex tasks. However, these reasoning traces can be redundant or logically inconsistent, making them a new source of hallucination that is difficult to detect. Existing hallucination detection methods focus primarily on answer-level uncertainty and often fail to detect hallucinations or logical inconsistencies arising from the model's reasoning trace. This oversight is particularly problematic for LRMs, where the explicit thinking trace is not only an important support to the model's decision-making process but also a key source of potential hallucination. To this end, we propose RACE (Reasoning and Answer Consistency Evaluation), a novel framework specifically tailored for hallucination detection in LRMs. RACE operates by extracting essential reasoning steps and computing four diagnostic signals: inter-sample consistency of reasoning traces, entropy-based answer uncertainty, semantic alignment between reasoning and answers, and internal coherence of reasoning. This joint analysis enables fine-grained hallucination detection even when the final answer appears correct. Experiments across datasets and different LLMs demonstrate that RACE outperforms existing hallucination detection baselines, offering a robust and generalizable solution for evaluating LRMs. Our code is available at: https://github.com/bebr2/RACE.

Problem

Research questions and friction points this paper is trying to address.

Detects hallucinations in Large Reasoning Models' reasoning traces

Evaluates consistency between answers and reasoning steps

Identifies logical inconsistencies in multi-step reasoning processes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts essential reasoning steps for analysis

Computes four diagnostic signals jointly

Detects hallucinations beyond answer correctness

🔎 Similar Papers

FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering