When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

169K/year
🤖 AI Summary
Current test-time scaling approaches for solving challenging mathematical problems suffer from high computational costs and diminishing returns. This work formulates test-time scaling as an instance-level policy routing problem guided by output divergence and introduces a training-free, divergence-aware mechanism that dynamically selects among lightweight reasoning, majority voting, or rewrite-and-reconstruct strategies based on the difficulty of each input instance. Evaluated across seven mathematical benchmarks and three large language models, the proposed method achieves accuracy gains of 3%–7% while substantially reducing sampling overhead, effectively balancing performance and computational efficiency.
📝 Abstract
Large Reasoning Models (LRMs) achieve strong performance on mathematical reasoning tasks but remain unreliable on challenging instances. Existing test-time scaling methods, such as repeated sampling, self-correction, and tree search, improve performance at the cost of increased computation, yet often exhibit diminishing returns on hard problems. We observe that output disagreement is strongly correlated with instance difficulty and prediction correctness, providing a useful signal for guiding instance-level strategy selection at test time. Based on this insight, we propose a training-free framework that formulates test-time scaling as an instance-level routing problem, rather than allocating more computation within a single strategy, dynamically selecting among different scaling strategies based on output disagreement. The framework applies lightweight resolution for consistent cases, majority voting for moderate disagreement, and rewriting-based reformulation for highly ambiguous instances. Experiments on seven mathematical benchmarks and three models show that our method improves accuracy by 3% - 7% while reducing sampling cost compared to existing approaches.
Problem

Research questions and friction points this paper is trying to address.

test-time scaling
large reasoning models
instance difficulty
strategy selection
output disagreement
Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time scaling
disagreement-guided routing
strategy selection
reasoning models
training-free framework
🔎 Similar Papers