Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

📅 2026-01-29

📈 Citations: 1

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the limitations of traditional chain-of-thought methods, which incur high computational costs in discrete token spaces and often converge to a single reasoning path, as well as existing implicit reasoning approaches that rely on fixed step counts and lack dynamic termination mechanisms. The authors reformulate implicit reasoning as a planning process with adaptive termination by decoupling reasoning from language generation. Reasoning occurs in a continuous latent space via deterministic state trajectories, while a separate decoder produces textual outputs on demand. This approach is the first to enable adaptive reasoning length, substantially enhancing reasoning diversity and scalability. Experimental results show that, although greedy accuracy is slightly lower, the model explores a broader solution space, providing a transparent and flexible foundation for search during inference.

Technology Category

Application Category

📝 Abstract

Chain-of-Thought (CoT) empowers Large Language Models (LLMs) to tackle complex problems, but remains constrained by the computational cost and reasoning path collapse when grounded in discrete token spaces. Recent latent reasoning approaches attempt to optimize efficiency by performing reasoning within continuous hidden states. However, these methods typically operate as opaque end-to-end mappings from explicit reasoning steps to latent states, and often require a pre-defined number of latent steps during inference. In this work, we introduce PLaT (Planning with Latent Thoughts), a framework that reformulates latent reasoning as planning by fundamentally decouple reasoning from verbalization. We model reasoning as a deterministic trajectory of latent planning states, while a separate Decoder grounds these thoughts into text when necessary. This decoupling allows the model to dynamically determine when to terminate reasoning rather than relying on fixed hyperparameters. Empirical results on mathematical benchmarks reveal a distinct trade-off: while PLaT achieves lower greedy accuracy than baselines, it demonstrates superior scalability in terms of reasoning diversity. This indicates that PLaT learns a robust, broader solution space, offering a transparent and scalable foundation for inference-time search. Our code can be found in https://github.com/yunsaijc/PLaT.

Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought

latent reasoning

reasoning collapse

computational cost

inference termination

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Reasoning

Chain-of-Thought

Decoupling Reasoning and Verbalization