MARCOS: Deep Thinking by Markov Chain of Continuous Thoughts

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing chain-of-thought (CoT) reasoning in large language models faces three fundamental bottlenecks: autoregressive generation impedes efficiency; discrete token spaces incur information loss; and tight coupling between thinking and expression induces myopic behavior. This paper introduces MARCOS, the first framework to model reasoning as a latent Markov chain in a continuous, high-dimensional hidden space—thereby decoupling “thinking” from “generation” and transcending token-level constraints. MARCOS employs a two-stage variational training procedure to infer and optimize unobservable mental states, while enabling step-wise stochastic control. Experiments on three major benchmarks—including GSM8K—demonstrate that MARCOS matches standard CoT accuracy, achieving a 15.7× inference speedup on GSM8K with a 4.7% absolute accuracy gain. These results establish a synergistic breakthrough in both efficiency and effectiveness for LLM reasoning.

Technology Category

Application Category

📝 Abstract
The current paradigm for reasoning in large language models (LLMs) involves models "thinking out loud" via a sequence of tokens, known as chain-of-thought (CoT). This approach, while effective, has several significant drawbacks. Firstly, inference requires autoregressive generation of often thousands of CoT tokens, which is slow and computationally expensive. Secondly, it constrains reasoning to the discrete space of tokens, creating an information bottleneck across reasoning steps. Thirdly, it fundamentally entangles reasoning with token generation, forcing LLMs to "think while speaking," which causes potentially short-sighted reasoning. In light of these limitations, we re-imagine reasoning in LLMs and present a new paradigm: MARCOS. In our approach, rather than autoregressively generating tokens, we model reasoning as a hidden Markov chain of continuous, high-dimensional "thoughts". Each reasoning step involves a transition of the internal thoughts, where explicit reasoning steps (which may consist of hundreds of tokens) serve as observable variables, which are windows to peek into the implicit thoughts. Since this latent process is incompatible with the standard supervised learning, we further propose a two-phase variational training scheme. Our experiments on three benchmarks demonstrate that MARCOS outperforms existing continuous reasoning methods and, for the first time, achieves performance comparable to token-based CoT, even surpassing it by 4.7% on GSM8K with up to 15.7x speedup in inference. Beyond this, MARCOS offers additional advantages, such as step-level instead of token-level control over randomness, opening significant opportunities for reinforcement learning and reasoning in LLMs.
Problem

Research questions and friction points this paper is trying to address.

Replaces slow autoregressive token generation with continuous thought transitions
Overcomes information bottleneck in discrete token-based reasoning steps
Decouples reasoning process from token generation to prevent short-sighted thinking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Models reasoning as Markov chain of continuous thoughts
Uses two-phase variational training scheme for latent process
Enables step-level control over randomness in reasoning
🔎 Similar Papers
No similar papers found.