Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

๐Ÿ“… 2026-01-13
๐Ÿ“ˆ Citations: 1
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the inefficiency of traditional chain-of-thought (CoT) reasoning, which generates lengthy, discrete token sequences that struggle to capture multi-path reasoning in complex tasks. The authors propose a stochastic soft reasoning mechanism that, at each step, samples K candidate tokens and aggregates their embeddings into a single continuous, multiplexed tokenโ€”thereby constructing an optimizable probability distribution without increasing sequence length. This approach adaptively reverts to standard CoT when the model is confident and compactly fuses multiple reasoning paths under uncertainty. Optimization is achieved through a token-level branch-and-merge architecture combined with on-policy reinforcement learning. Experiments demonstrate consistent and significant improvements over strong baselines across multiple mathematical reasoning benchmarks, from Pass@1 to Pass@1024, while producing shorter reasoning traces.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT), but at the cost of long, low-bandwidth token sequences. Humans, by contrast, often reason softly by maintaining a distribution over plausible next steps. Motivated by this, we propose Multiplex Thinking, a stochastic soft reasoning mechanism that, at each thinking step, samples K candidate tokens and aggregates their embeddings into a single continuous multiplex token. This preserves the vocabulary embedding prior and the sampling dynamics of standard discrete generation, while inducing a tractable probability distribution over multiplex rollouts. Consequently, multiplex trajectories can be directly optimized with on-policy reinforcement learning (RL). Importantly, Multiplex Thinking is self-adaptive: when the model is confident, the multiplex token is nearly discrete and behaves like standard CoT; when it is uncertain, it compactly represents multiple plausible next steps without increasing sequence length. Across challenging math reasoning benchmarks, Multiplex Thinking consistently outperforms strong discrete CoT and RL baselines from Pass@1 through Pass@1024, while producing shorter sequences. The code and checkpoints are available at https://github.com/GMLR-Penn/Multiplex-Thinking.
Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought
reasoning efficiency
token sequence length
stochastic reasoning
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiplex Thinking
soft reasoning
token-wise branch-and-merge
on-policy reinforcement learning
self-adaptive reasoning
๐Ÿ”Ž Similar Papers
No similar papers found.