Adaptive Parallel Monte Carlo Tree Search for Efficient Test-time Compute Scaling

📅 2026-04-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
This work addresses the high variability and long-tail latency in Monte Carlo Tree Search (MCTS) during test-time computation expansion, which stems from inefficient search trajectories and limits the effectiveness of existing optimizations when search progress stalls. The authors propose a negative early-exit mechanism that proactively prunes unproductive trajectories and integrates an adaptive boosting strategy to dynamically reallocate the freed computational resources, thereby mitigating resource contention among parallel searches. Implemented within the vLLM inference framework, this approach significantly reduces end-to-end p99 latency and improves system throughput while preserving the accuracy of large language model inference.

Technology Category

Application Category

📝 Abstract
Monte Carlo Tree Search (MCTS) is an effective test-time compute scaling (TTCS) method for improving the reasoning performance of large language models, but its highly variable execution time leads to severe long-tail latency in practice. Existing optimizations such as positive early exit, reduce latency in favorable cases but are less effective when search continues without meaningful progress. We introduce {\it negative early exit}, which prunes unproductive MCTS trajectories, and an {\it adaptive boosting mechanism} that reallocates reclaimed computation to reduce resource contention among concurrent searches. Integrated into vLLM, these techniques substantially reduce p99 end-to-end latency while improving throughput and maintaining reasoning accuracy.
Problem

Research questions and friction points this paper is trying to address.

Monte Carlo Tree Search
test-time compute scaling
long-tail latency
execution time variability
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

negative early exit
adaptive boosting
Monte Carlo Tree Search
test-time compute scaling
latency optimization