Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing

📅 2026-02-03
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes Parallel-Probe, a novel approach to parallel inference that addresses the high computational cost and the underutilization of global branch dynamics in existing methods. By introducing a 2D probing mechanism that periodically samples intermediate outputs from all branches, the study uncovers key patterns—including non-monotonic width-depth trade-offs, heterogeneous branch lengths, and the early stabilization of global consensus. Leveraging these insights, the authors design a training-free online controller that dynamically adjusts inference depth via a consensus-based early-exit strategy and modulates width through bias-driven branch pruning. Experiments across three benchmarks and multiple models demonstrate that Parallel-Probe significantly outperforms standard majority voting, reducing sequential token consumption by up to 35.8% and total token cost by up to 25.8%, while maintaining competitive accuracy.

Technology Category

Application Category

📝 Abstract
Parallel thinking has emerged as a promising paradigm for reasoning, yet it imposes significant computational burdens. Existing efficiency methods primarily rely on local, per-trajectory signals and lack principled mechanisms to exploit global dynamics across parallel branches. We introduce 2D probing, an interface that exposes the width-depth dynamics of parallel thinking by periodically eliciting intermediate answers from all branches. Our analysis reveals three key insights: non-monotonic scaling across width-depth allocations, heterogeneous reasoning branch lengths, and early stabilization of global consensus. Guided by these insights, we introduce $\textbf{{Parallel-Probe}}$, a training-free controller designed to optimize online parallel thinking. Parallel-Probe employs consensus-based early stopping to regulate reasoning depth and deviation-based branch pruning to dynamically adjust width. Extensive experiments across three benchmarks and multiple models demonstrate that Parallel-Probe establishes a superior Pareto frontier for test-time scaling. Compared to standard majority voting, it reduces sequential tokens by up to $\textbf{35.8}$% and total token cost by over $\textbf{25.8}$% while maintaining competitive accuracy.
Problem

Research questions and friction points this paper is trying to address.

parallel thinking
computational efficiency
reasoning
global dynamics
token cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

parallel thinking
2D probing
consensus-based early stopping
branch pruning
token efficiency
🔎 Similar Papers
No similar papers found.