🤖 AI Summary
Large reasoning models (LRMs) adaptively allocate reasoning strength—measured by the number of reasoning tokens—based on question difficulty, yet the underlying mechanism remains poorly understood.
Method: We conduct a mechanistic analysis grounded in activation states, revealing that LRMs pre-plan reasoning length prior to token generation. Using linear probing, activation space decomposition, logits-layer interventions, and directional control experiments, we identify a latent “pre-allocation direction vector” whose norm causally governs reasoning length.
Contribution/Results: We propose the Pre-Allocation Direction Vector theory: (1) its norm accurately predicts reasoning length; (2) adding or subtracting this direction enables controllable modulation of both reasoning length and task performance; and (3) it supports overthinking detection and efficient inference on simple questions. Our work establishes an interpretable, intervention-friendly paradigm for understanding and steering LRMs’ difficulty-aware reasoning behavior.
📝 Abstract
Recent studies empirically reveal that large reasoning models (LRMs) can automatically allocate more reasoning strengths (i.e., the number of reasoning tokens) for harder problems, exhibiting difficulty-awareness for better task performance. While this automatic reasoning strength allocation phenomenon has been widely observed, its underlying mechanism remains largely unexplored. To this end, we provide explanations for this phenomenon from the perspective of model activations. We find evidence that LRMs pre-plan the reasoning strengths in their activations even before generation, with this reasoning strength causally controlled by the magnitude of a pre-allocated directional vector. Specifically, we show that the number of reasoning tokens is predictable solely based on the question activations using linear probes, indicating that LRMs estimate the required reasoning strength in advance. We then uncover that LRMs encode this reasoning strength through a pre-allocated directional vector embedded in the activations of the model, where the vector's magnitude modulates the reasoning strength. Subtracting this vector can lead to reduced reasoning token number and performance, while adding this vector can lead to increased reasoning token number and even improved performance. We further reveal that this direction vector consistently yields positive reasoning length prediction, and it modifies the logits of end-of-reasoning tokento affect the reasoning length. Finally, we demonstrate two potential applications of our findings: overthinking behavior detection and enabling efficient reasoning on simple problems. Our work provides new insights into the internal mechanisms of reasoning in LRMs and offers practical tools for controlling their reasoning behaviors. Our code is available at https://github.com/AlphaLab-USTC/LRM-plans-CoT.