Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing parallel test-time scaling methods suffer from isolated reasoning branches, leading to redundant exploration and suboptimal efficiency. This work proposes a Collaborative Parallel Thinking (CPT) framework that, for the first time, enables real-time information collaboration and reuse among parallel branches at test time without requiring any additional training. CPT facilitates inter-branch knowledge sharing through intermediate information extraction, a deduplicated information pool, and a context broadcasting mechanism, substantially reducing redundant computation. Evaluated on the HMMT and AIME benchmarks, CPT consistently achieves superior accuracy–latency Pareto fronts across varying inference budgets and model scales, significantly outperforming strong baseline methods.
📝 Abstract
Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during search: intermediate discoveries remain branch-private and cannot guide other branches in time. This information isolation causes substantial redundant exploration, as branches repeatedly rediscover information already found elsewhere and require more search steps to collect complete decision information needed to reach correct answers. To bridge this gap, we propose \textbf{Collaborative Parallel Thinking (CPT)}, a training-free inference framework that enables search-time information sharing across parallel branches. CPT extracts compact intermediate information from ongoing branches, maintains a deduplicated query-level information pool, and broadcasts pool entries through the input context, allowing each branch in subsequent search steps to reuse discoveries made by other branches rather than rediscover the same information. Empirically, experiments on HMMT and AIME benchmarks show that CPT establishes a stronger accuracy--latency Pareto frontier than strong baselines across rollout budgets and model scales, highlighting search-time collaboration as an effective direction for efficient parallel TTS.
Problem

Research questions and friction points this paper is trying to address.

Test-Time Scaling
parallel search
information isolation
redundant exploration
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative Parallel Thinking
Test-Time Scaling
information sharing
parallel search
inference efficiency
🔎 Similar Papers
No similar papers found.