Latent Reasoning with Supervised Thinking States

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high latency and computational cost of chain-of-thought (CoT) reasoning in large language models, which stems from generating lengthy intermediate steps. The authors propose dynamically generating “thought states” during input processing by inserting learnable thought tokens at regular intervals within the input sequence, mapping them into an embedding space, and fusing them with subsequent tokens. This approach implicitly encodes CoT’s recursive reasoning into a single forward pass. Trained in parallel using natural language supervision and teacher forcing, the method maintains inference efficiency while preserving learnability. Experiments demonstrate that it outperforms existing implicit reasoning approaches across multiple tasks: achieving near-standard CoT performance on mathematical reasoning, matching 2-hop question answering accuracy with lower latency, and exhibiting stronger generalization on state-tracking tasks.

Technology Category

Application Category

📝 Abstract
Reasoning with a chain-of-thought (CoT) enables Large Language Models (LLMs) to solve complex tasks but incurs significant inference costs due to the generation of long rationales. We propose Thinking States, a method that performs reasoning {\em while} the input is processing. Specifically, Thinking States generates sequences of thinking tokens every few input tokens, transforms the thoughts back into embedding space, and adds them to the following input tokens. This has two key advantages. First, it captures the recurrent nature of CoT, but where the thought tokens are generated as input is processing. Second, since the thoughts are represented as tokens, they can be learned from natural language supervision, and using teacher-forcing, which is parallelizable. Empirically, Thinking States outperforms other latent reasoning methods on multiple reasoning tasks, narrowing the gap to CoT on math problems, and matching its performance on 2-Hop QA with improved latency. On state-tracking tasks, we show Thinking States leads to stronger reasoning behavior than CoT, successfully extrapolating to longer sequences than seen during training.
Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought
reasoning
inference cost
Large Language Models
latent reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Thinking States
latent reasoning
chain-of-thought
teacher-forcing
parallelizable inference
🔎 Similar Papers
No similar papers found.