🤖 AI Summary
This work addresses a critical limitation in existing tree search decoding methods for large language model inference: their disregard for fixed token budget constraints, which often leads to excessive branching in later stages or premature termination. To overcome this, the authors propose Budget-Guided Monte Carlo Tree Search (BG-MCTS), the first approach to explicitly integrate token budgets into the tree search strategy. BG-MCTS dynamically balances exploration and exploitation—favoring broad exploration early on and progressively focusing on answer refinement while suppressing late-stage branching at shallow nodes as the budget depletes. By incorporating dynamic branch control and priority-based scheduling, BG-MCTS consistently outperforms existing budget-agnostic tree search methods across varying token budgets on the MATH500 and AIME24/25 benchmarks using open-source large language models.
📝 Abstract
Tree-search decoding is an effective form of test-time scaling for large language models (LLMs), but real-world deployment imposes a fixed per-query token budget that varies across settings. Existing tree-search policies are largely budget-agnostic, treating the budget as a termination condition, which can lead to late-stage over-branching or premature termination. We propose {Budget-Guided MCTS} (BG-MCTS), a tree-search decoding algorithm that aligns its search policy with the remaining token budget: it starts with broad exploration, then prioritizes refinement and answer completion as the budget depletes while reducing late-stage branching from shallow nodes. BG-MCTS consistently outperforms budget-agnostic tree-search baselines across different budgets on MATH500 and AIME24/25 with open-weight LLMs.