π€ AI Summary
Large language models (LLMs) struggle to effectively scale reasoning-time computation without external feedback signals. Method: This paper proposes an adaptive branching tree search framework that, for the first time, deeply integrates external feedback into the reasoning-time search process. It introduces a dynamic βbroadβdeepβ decision mechanism: at each tree node, it adaptively chooses between expanding new responses (wider) or iteratively refining existing ones (deeper). The framework incorporates diversity-aware response generation, feedback-driven node evaluation, and an enhanced Monte Carlo Tree Search (MCTS) algorithm. Contribution/Results: Experiments demonstrate substantial improvements over repeated sampling and standard MCTS on code generation and engineering tasks. The results validate the critical role of synergistic exploration and exploitation in reasoning-time scaling and establish a novel paradigm for LLM reasoning enhancement under unsupervised feedback conditions.
π Abstract
Recent advances demonstrate that increasing inference-time computation can significantly boost the reasoning capabilities of large language models (LLMs). Although repeated sampling (i.e., generating multiple candidate outputs) is a highly effective strategy, it does not leverage external feedback signals for refinement, which are often available in tasks like coding. In this work, we propose $ extit{Adaptive Branching Monte Carlo Tree Search (AB-MCTS)}$, a novel inference-time framework that generalizes repeated sampling with principled multi-turn exploration and exploitation. At each node in the search tree, AB-MCTS dynamically decides whether to"go wider"by expanding new candidate responses or"go deeper"by revisiting existing ones based on external feedback signals. We evaluate our method on complex coding and engineering tasks using frontier models. Empirical results show that AB-MCTS consistently outperforms both repeated sampling and standard MCTS, underscoring the importance of combining the response diversity of LLMs with multi-turn solution refinement for effective inference-time scaling.