Reasoning Models Don't Just Think Longer, They Move Differently

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This study investigates whether reasoning-trained models solve difficult problems by merely extending their reasoning steps or by adopting qualitatively distinct internal reasoning pathways. To disentangle the mechanical influence of generation length on trajectory geometry, the authors propose a novel framework combining geometric analysis of hidden-state trajectories with length-residualization correction. Through cross-domain experiments—spanning programming, mathematics, and Boolean satisfiability—and validation via linear probing and behavioral annotations, they provide the first evidence that, in code-related tasks, reasoning models exhibit more direct and consistently curved corrected trajectories when tackling hard problems. Although weaker, this effect remains statistically significant in other domains, suggesting that reasoning training induces genuine internal strategy shifts rather than simply increasing step count.

📝 Abstract

Reasoning-trained language models often spend more tokens on harder problems, but longer chains of thought do not show whether a model is merely computing for more steps or following a different internal trajectory. We study this distinction through hidden-state trajectories during chain-of-thought generation across competitive programming, mathematics, and Boolean satisfiability. Raw trajectory geometry is strongly shaped by generation length: longer generations mechanically alter path statistics, so difficulty-dependent comparisons are misleading without adjustment. After residualizing trajectory statistics on length, difficulty remains systematically coupled to corrected trajectory geometry across all domains studied. The clearest reasoning-specific separation appears in the code domain, where harder problems show more direct corrected trajectories and less heterogeneous local curvature in reasoning-trained models than in matched instruction-tuned baselines. Corrected difficulty-geometry coupling is weaker, but still present, in mathematics and Boolean satisfiability. Prompt-stage linear probes do not mirror the code-domain separation, and behavioral annotations show that stronger corrected coupling co-occurs with strategy shifts and uncertainty monitoring. Together, these findings establish length correction as a prerequisite for generation-time trajectory analysis and show that reasoning training can be associated with distinct corrected trajectory geometry, with the strength of the effect depending on the domain.

Problem

Research questions and friction points this paper is trying to address.

reasoning models

chain-of-thought

hidden-state trajectories

generation length

trajectory geometry

Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning models

chain-of-thought

hidden-state trajectories