🤖 AI Summary
In classical planning, Monte Carlo Tree Search (MCTS) node selection incurs O(log N) time overhead due to reliance on a tree-structured OPEN list, causing severe efficiency degradation with increasing search depth. To address this, we propose a two-level MCTS framework: an upper level maintains a lightweight tree structure, while the lower level performs best-first search—within a bounded expansion budget—over candidate leaf nodes, achieving amortized O(1) node selection. This work introduces, for the first time, amortized constant-time node selection in MCTS. Furthermore, tree folding is integrated to compress redundant nodes, eliminating the need for explicit maintenance of large ordered data structures. Experiments demonstrate that our approach significantly reduces node selection latency in deep-search regimes, outperforming both standard MCTS and queue-based OPEN list implementations in overall planning efficiency—while preserving solution quality and convergence guarantees.
📝 Abstract
We study an efficient implementation of Multi-Armed Bandit (MAB)-based Monte-Carlo Tree Search (MCTS) for classical planning. One weakness of MCTS is that it spends a significant time deciding which node to expand next. While selecting a node from an OPEN list with $N$ nodes has $O(1)$ runtime complexity with traditional array-based priority-queues for dense integer keys, the tree-based OPEN list used by MCTS requires $O(log N)$, which roughly corresponds to the search depth $d$. In classical planning, $d$ is arbitrarily large (e.g., $2^k-1$ in $k$-disk Tower-of-Hanoi) and the runtime for node selection is significant, unlike in game tree search, where the cost is negligible compared to the node evaluation (rollouts) because $d$ is inherently limited by the game (e.g., $dleq 361$ in Go). To improve this bottleneck, we propose a bilevel modification to MCTS that runs a best-first search from each selected leaf node with an expansion budget proportional to $d$, which achieves amortized $O(1)$ runtime for node selection, equivalent to the traditional queue-based OPEN list. In addition, we introduce Tree Collapsing, an enhancement that reduces action selection steps and further improves the performance.