Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This study addresses the problem of assessing faithfulness in the intermediate reasoning steps of Large Reasoning Models (LRMs)—i.e., whether such steps genuinely causally drive the final answer—to enable reliable monitoring, interpretation, and control. To this end, we propose, for the first time, a two-dimensional faithfulness metric: *Intra-Draft* (measuring step-to-step causal consistency within a reasoning draft) and *Draft-to-Answer* (measuring the causal influence of the entire draft on the final output). Our method introduces a causal verification framework grounded in controlled counterfactual interventions: systematically perturbing individual reasoning steps or the full draft and quantifying resulting changes in subsequent steps or the final answer. Empirical evaluation across six state-of-the-art LRMs reveals a pervasive “selective faithfulness” phenomenon—where certain intermediate steps are ignored or their conclusions abandoned—exposing fundamental limitations in the causal coherence and explanatory reliability of current chain-of-thought reasoning.

Technology Category

Application Category

📝 Abstract

Large Reasoning Models (LRMs) have significantly enhanced their capabilities in complex problem-solving by introducing a thinking draft that enables multi-path Chain-of-Thought explorations before producing final answers. Ensuring the faithfulness of these intermediate reasoning processes is crucial for reliable monitoring, interpretation, and effective control. In this paper, we propose a systematic counterfactual intervention framework to rigorously evaluate thinking draft faithfulness. Our approach focuses on two complementary dimensions: (1) Intra-Draft Faithfulness, which assesses whether individual reasoning steps causally influence subsequent steps and the final draft conclusion through counterfactual step insertions; and (2) Draft-to-Answer Faithfulness, which evaluates whether final answers are logically consistent with and dependent on the thinking draft, by perturbing the draft's concluding logic. We conduct extensive experiments across six state-of-the-art LRMs. Our findings show that current LRMs demonstrate selective faithfulness to intermediate reasoning steps and frequently fail to faithfully align with the draft conclusions. These results underscore the need for more faithful and interpretable reasoning in advanced LRMs.

Problem

Research questions and friction points this paper is trying to address.

Evaluating faithfulness of multi-path reasoning in Large Reasoning Models

Assessing causal influence of intermediate steps on final conclusions

Ensuring logical consistency between thinking drafts and final answers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Counterfactual intervention framework evaluates reasoning faithfulness

Assesses Intra-Draft and Draft-to-Answer Faithfulness dimensions

Tests causal influence and logical consistency in reasoning steps

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting