State over Tokens: Characterizing the Role of Reasoning Tokens

📅 2025-12-14

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This paper addresses the widespread misinterpretation of “reasoning tokens” in large language models (LLMs) as human-like semantic reasoning, clarifying that they are instead the sole persistent externalized representation of internal computation within an otherwise stateless generative process. Method: We introduce the “State over Tokens” (SoT) framework, formally redefining reasoning tokens as symbolic carriers of latent computational states. Integrating generative mechanism analysis, formal modeling of state persistence, and empirical interpretability evaluation, we characterize how reasoning tokens functionally mediate correct output generation. Contribution/Results: Our work refutes the conflation of chain-of-thought prompting with genuine cognitive reasoning, establishes SoT as a principled paradigm for decoding LLM internal states, and opens a new research direction in interpretability grounded in explicit state modeling—thereby advancing foundational understanding of LLM computation and transparency.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) can generate reasoning tokens before their final answer to boost performance on complex tasks. While these sequences seem like human thought processes, empirical evidence reveals that they are not a faithful explanation of the model's actual reasoning process. To address this gap between appearance and function, we introduce the State over Tokens (SoT) conceptual framework. SoT reframes reasoning tokens not as a linguistic narrative, but as an externalized computational state -- the sole persistent information carrier across the model's stateless generation cycles. This explains how the tokens can drive correct reasoning without being a faithful explanation when read as text and surfaces previously overlooked research questions on these tokens. We argue that to truly understand the process that LLMs do, research must move beyond reading the reasoning tokens as text and focus on decoding them as state.

Problem

Research questions and friction points this paper is trying to address.

Reveals reasoning tokens are not faithful explanations of LLM reasoning processes

Introduces State over Tokens framework to reframe tokens as computational state

Proposes decoding reasoning tokens as state rather than reading as text

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reframes reasoning tokens as computational state

Treats tokens as externalized persistent information carriers

Focuses on decoding tokens as state not text

🔎 Similar Papers

Aligned at the Start: Conceptual Groupings in LLM Embeddings