Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing tree search decoding methods for large language model inference: their disregard for fixed token budget constraints, which often leads to excessive branching in later stages or premature termination. To overcome this, the authors propose Budget-Guided Monte Carlo Tree Search (BG-MCTS), the first approach to explicitly integrate token budgets into the tree search strategy. BG-MCTS dynamically balances exploration and exploitation—favoring broad exploration early on and progressively focusing on answer refinement while suppressing late-stage branching at shallow nodes as the budget depletes. By incorporating dynamic branch control and priority-based scheduling, BG-MCTS consistently outperforms existing budget-agnostic tree search methods across varying token budgets on the MATH500 and AIME24/25 benchmarks using open-source large language models.

Technology Category

Application Category

📝 Abstract
Tree-search decoding is an effective form of test-time scaling for large language models (LLMs), but real-world deployment imposes a fixed per-query token budget that varies across settings. Existing tree-search policies are largely budget-agnostic, treating the budget as a termination condition, which can lead to late-stage over-branching or premature termination. We propose {Budget-Guided MCTS} (BG-MCTS), a tree-search decoding algorithm that aligns its search policy with the remaining token budget: it starts with broad exploration, then prioritizes refinement and answer completion as the budget depletes while reducing late-stage branching from shallow nodes. BG-MCTS consistently outperforms budget-agnostic tree-search baselines across different budgets on MATH500 and AIME24/25 with open-weight LLMs.
Problem

Research questions and friction points this paper is trying to address.

tree-search decoding
token budget
test-time scaling
large language models
budget-awareness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Budget-Guided MCTS
tree-search decoding
token budget
test-time scaling
large language models
🔎 Similar Papers
S
Sora Miyamoto
Department of Computer Science, School of Computing, Institute of Science Tokyo, Japan
D
Daisuke Oba
Department of Computer Science, School of Computing, Institute of Science Tokyo, Japan
Naoaki Okazaki
Naoaki Okazaki
Institute of Science Tokyo
natural language processingartificial intelligencemachine learning