Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
It remains unclear whether chain-of-thought (CoT) reasoning generated by large language models (LLMs) genuinely reflects their internal inference process or merely serves as post-hoc rationalization. Method: We propose the True Thinking Score (TTS) to quantify the causal influence of each CoT step and identify a “TrueThinking direction” in the model’s latent space via causal analysis and vector steering. Contribution/Results: Empirical analysis reveals that only 2.3% of CoT steps achieve high TTS (≥0.7), indicating that most steps are decorative rather than causally consequential. Crucially, self-verification steps—often pivotal for sound reasoning—are shown to exhibit higher TTS and can be reliably identified and amplified via directional intervention. This work establishes the first interpretable, quantifiable metric for authentic reasoning within CoT and enables controllable, causally grounded editing of reasoning traces. As a result, both reasoning fidelity and downstream task performance are significantly improved.

Technology Category

Application Category

📝 Abstract
Recent large language models (LLMs) can generate long Chain-of-Thought (CoT) at test time, enabling them to solve complex tasks. These reasoning steps in CoT are often assumed as a faithful reflection of the model's internal thinking process, and used to monitor unsafe intentions. However, we find many reasoning steps don't truly contribute to LLMs' prediction. We measure the step-wise causal influence of each reasoning step on the model's final prediction with a proposed True Thinking Score (TTS). We reveal that LLMs often interleave between true-thinking steps (which are genuinely used to produce the final output) and decorative-thinking steps (which only give the appearance of reasoning but have minimal causal impact). Notably, only a small subset of the total reasoning steps have a high TTS that causally drive the model's prediction: e.g., for the AIME dataset, only an average of 2.3% of reasoning steps in CoT have a TTS >= 0.7 (range: 0-1) under the Qwen-2.5 model. Furthermore, we identify a TrueThinking direction in the latent space of LLMs. By steering along or against this direction, we can force the model to perform or disregard certain CoT steps when computing the final result. Finally, we highlight that self-verification steps in CoT (i.e., aha moments) can also be decorative, where LLMs do not truly verify their solution. Steering along the TrueThinking direction can force internal reasoning over these steps, resulting in a change in the final results. Overall, our work reveals that LLMs often verbalize reasoning steps without actually performing them internally, which undermines both the efficiency of LLM reasoning and the trustworthiness of CoT.
Problem

Research questions and friction points this paper is trying to address.

Identifying fake reasoning steps in Chain-of-Thought generation
Measuring causal impact of reasoning steps on final predictions
Revealing decorative thinking steps that don't contribute internally
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed True Thinking Score measures causal influence
Identified TrueThinking direction in latent model space
Steering direction forces internal reasoning over steps
🔎 Similar Papers
No similar papers found.