🤖 AI Summary
This work identifies and characterizes “performative reasoning” in large language models—a phenomenon wherein models generate redundant chain-of-thought text after already forming an internal answer, effectively staging a “reasoning theater.” To address this, the authors introduce a novel method based on activation probing to detect the model’s internal belief state, combined with early forced answering and chain-of-thought monitoring. Evaluated on DeepSeek-R1 671B and GPT-OSS 120B, the approach effectively distinguishes genuine multi-hop reasoning from performative reasoning and enables adaptive computation: it reduces generated tokens by 80% on MMLU and by 30% on GPQA-Diamond while maintaining comparable accuracy.
📝 Abstract
We provide evidence of performative chain-of-thought (CoT) in reasoning models, where a model becomes strongly confident in its final answer, but continues generating tokens without revealing its internal belief. Our analysis compares activation probing, early forced answering, and a CoT monitor across two large models (DeepSeek-R1 671B&GPT-OSS 120B) and find task difficulty-specific differences: The model's final answer is decodable from activations far earlier in CoT than a monitor is able to say, especially for easy recall-based MMLU questions. We contrast this with genuine reasoning in difficult multihop GPQA-Diamond questions. Despite this, inflection points (e.g., backtracking,'aha'moments) occur almost exclusively in responses where probes show large belief shifts, suggesting these behaviors track genuine uncertainty rather than learned"reasoning theater."Finally, probe-guided early exit reduces tokens by up to 80% on MMLU and 30% on GPQA-Diamond with similar accuracy, positioning attention probing as an efficient tool for detecting performative reasoning and enabling adaptive computation.