$\varepsilon$-Good Action Identification in Fixed-Budget Monte Carlo Tree Search

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

258K/year

🤖 AI Summary

This work addresses the problem of identifying an $\varepsilon$-optimal action in depth-2 max-min trees under a fixed sampling budget, where the goal is to recommend a subtree whose minimal leaf value is within $\varepsilon$ of the optimal maximin value. The paper proposes the first $\varepsilon$-agnostic algorithm that requires no prior knowledge of $\varepsilon$ and achieves instance-dependent error guarantees for any valid $\varepsilon$. By integrating Monte Carlo tree search with halving-style and Successive Rejects strategies from multi-armed bandits, the method introduces a novel complexity measure, $H_2(\varepsilon)$, which captures both inter-subtree and intra-subtree hardness. Theoretical analysis shows that the probability of misidentification decays exponentially as $\exp(-\widetilde{\Theta}(T/H_2(\varepsilon)))$, providing the first provable fixed-budget performance guarantee for max-min action identification and revealing fundamental differences in difficulty structure compared to classical multi-armed bandit problems.

📝 Abstract

We study the fixed-budget max-min action identification problem in depth-2 max-min trees, an important special case of Monte Carlo Tree Search. A learner sequentially allocates $T$ samples to leaves and then recommends a subtree whose minimum leaf value is largest. Motivated by approximate planning, we focus on $\varepsilon$-good subtree identification, where any subtree whose min value is within $\varepsilon$ of the optimal maximin value is acceptable. Our main contribution is an $\varepsilon$-agnostic algorithm: it does not require $\varepsilon$ as input, but achieves instance-dependent error bounds for every meaningful $\varepsilon$. We show that the misidentification probability decays as $\exp(-\widetildeΘ(T/H_2(\varepsilon)))$, where $H_2(\varepsilon)$ captures both cross-subtree and within-subtree gaps. When each subtree has a single leaf, the problem reduces to standard fixed-budget best-arm identification, and our analysis recovers, up to accelerating factors, known $\varepsilon$-good guarantees for halving-style methods while giving a new $\varepsilon$-good guarantee for Successive Rejects. On the lower-bound side, we provide complementary positive and negative results showing that max-min identification has a different hardness structure from standard $K$-armed bandits. To our knowledge, this is the first provable fixed-budget algorithmic guarantee for max-min action identification.

Problem

Research questions and friction points this paper is trying to address.

max-min action identification

fixed-budget

Monte Carlo Tree Search

ε-good identification

depth-2 trees

Innovation

Methods, ideas, or system contributions that make the work stand out.

epsilon-good identification

fixed-budget MCTS

max-min tree