π€ AI Summary
This study investigates whether large language reasoning models form decision intent prior to generating reasoning chains. Employing linear probing and activation steering techniques, the work provides the first evidence that models encode detectable decision representations before emitting any reasoning tokens. Experiments demonstrate that linear probes can decode final decisions with high confidence at this early stage, and that steering intermediate activations reverses model behavior in 7%β79% of samples, with subsequent reasoning chains often post-hoc rationalizing the altered decisions. These findings challenge the prevailing assumption that models βthink before deciding,β instead revealing a phenomenon of decision precommitment embedded within the reasoning process.
π Abstract
We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe successfully decodes tool-calling decisions from pre-generation activations with very high confidence, and in some cases, even before a single reasoning token is produced. Activation steering supports this causally: perturbing the decision direction leads to inflated deliberation, and flips behavior in many examples (between 7 - 79% depending on model and benchmark). We also show through behavioral analysis that, when steering changes the decision, the chain-of-thought process often rationalizes the flip rather than resisting it. Together, these results suggest that reasoning models can encode action choices before they begin to deliberate in text.