Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the high computational cost in parallel inference caused by early errors that propagate through numerous invalid reasoning paths. The paper introduces the first systematic taxonomy for path pruning and proposes STOP, a learnable internal-signal-based pruning method grounded in this framework. STOP leverages a Super Token mechanism to dynamically model internal confidence at the prefix level, enabling effective elimination of unpromising paths during inference. Evaluated across models ranging from 1.5B to 20B parameters, STOP consistently outperforms existing baselines. Notably, under fixed computational budgets, it boosts the accuracy of GPT-OSS-20B on AIME25 from 84% to nearly 90%, while also offering practical deployment guidelines for real-world applications.

Technology Category

Application Category

📝 Abstract

Parallel reasoning enhances Large Reasoning Models (LRMs) but incurs prohibitive costs due to futile paths caused by early errors. To mitigate this, path pruning at the prefix level is essential, yet existing research remains fragmented without a standardized framework. In this work, we propose the first systematic taxonomy of path pruning, categorizing methods by their signal source (internal vs. external) and learnability (learnable vs. non-learnable). This classification reveals the unexplored potential of learnable internal methods, motivating our proposal of STOP (Super TOken for Pruning). Extensive evaluations across LRMs ranging from 1.5B to 20B parameters demonstrate that STOP achieves superior effectiveness and efficiency compared to existing baselines. Furthermore, we rigorously validate the scalability of STOP under varying compute budgets - for instance, boosting GPT-OSS-20B accuracy on AIME25 from 84% to nearly 90% under fixed compute budgets. Finally, we distill our findings into formalized empirical guidelines to facilitate optimal real-world deployment. Code, data and models are available at https://bijiaxihh.github.io/STOP

Problem

Research questions and friction points this paper is trying to address.

path pruning

parallel reasoning

Large Reasoning Models

compute efficiency

futile paths

Innovation

Methods, ideas, or system contributions that make the work stand out.

path pruning

learnable internal signals

STOP