Trustworthy Reasoning: Evaluating and Enhancing Factual Accuracy in LLM Intermediate Thought Processes

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently exhibit latent factual errors in intermediate reasoning steps, undermining decision reliability—especially in high-stakes domains. Method: We propose RELIANCE, a three-stage framework: (1) a fine-grained fact-checking classifier trained via counterfactual data augmentation to precisely localize factual deviations in reasoning chains; (2) a multidimensional reward-driven Group Relative Policy Optimization (GRPO) reinforcement learning algorithm that explicitly corrects flawed reasoning paths; and (3) activation-state-based interpretability analysis to characterize how factual grounding reshapes internal reasoning dynamics. Contribution/Results: Evaluated across 10 mainstream LLMs, RELIANCE improves factual accuracy of intermediate reasoning steps by up to 49.90%. Crucially, it preserves or surpasses baseline performance on challenging benchmarks—including Math-500, AIME-2024, and GPQA—demonstrating, for the first time, concurrent enhancement of factual trustworthiness and complex reasoning capability.

Technology Category

Application Category

📝 Abstract
We present RELIANCE (Reasoning Evaluation with Logical Integrity and Accuracy for Confidence Enhancement), a novel framework addressing a critical vulnerability in Large Language Models (LLMs): the prevalence of factual inaccuracies within intermediate reasoning steps despite correct final answers. This phenomenon poses substantial risks in high-stakes domains including healthcare, legal analysis, and scientific research, where erroneous yet confidently presented reasoning can mislead users into dangerous decisions. Our framework integrates three core components: (1) a specialized fact-checking classifier trained on counterfactually augmented data to detect subtle factual inconsistencies within reasoning chains; (2) a Group Relative Policy Optimization (GRPO) reinforcement learning approach that balances factuality, coherence, and structural correctness through multi-dimensional rewards; and (3) a mechanistic interpretability module examining how factuality improvements manifest in model activations during reasoning processes. Extensive evaluation across ten state-of-the-art models reveals concerning patterns: even leading models like Claude-3.7 and GPT-o1 demonstrate reasoning factual accuracy of only 81.93% and 82.57% respectively. RELIANCE significantly enhances factual robustness (up to 49.90% improvement) while maintaining or improving performance on challenging benchmarks including Math-500, AIME-2024, and GPQA. Furthermore, our activation-level analysis provides actionable insights into how factual enhancements reshape reasoning trajectories within model architectures, establishing foundations for future training methodologies that explicitly target factual robustness through activation-guided optimization.
Problem

Research questions and friction points this paper is trying to address.

Detect factual inaccuracies in LLM reasoning steps
Improve factual robustness in high-stakes domains
Balance factuality and coherence in model outputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fact-checking classifier detects reasoning inconsistencies
GRPO balances factuality and coherence via rewards
Interpretability module analyzes activation-level improvements
🔎 Similar Papers
No similar papers found.
Rui Jiao
Rui Jiao
Tsinghua University
AIDDGenerative ModelsGraph Neural Networks
Y
Yue Zhang
School of Computer Science and Technology, Shandong University
J
Jinku Li
School of Cyber Engineering, Xidian University