Polynomial Regret Concentration of UCB for Non-Deterministic State Transitions

📅 2025-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the lack of theoretical guarantees for Monte Carlo Tree Search (MCTS) under stochastic state transitions—i.e., in random environments. We propose an extension of MCTS tailored to the stochastic multi-armed bandit framework, building upon the modeling approach of Shah et al. (2020) and integrating stochastic process analysis with Upper Confidence Bound (UCB) principles. Crucially, we establish, for the first time, a tight polynomial regret upper bound for the UCB-based node selection rule in stochastic MCTS. This theoretical result overcomes the fundamental limitation of classical MCTS, which relies on deterministic transition assumptions. As a consequence, our analysis provides stronger robustness and convergence guarantees for MCTS in real-world applications involving randomness and partial observability—such as autonomous decision-making and financial modeling.

Technology Category

Application Category

📝 Abstract
Monte Carlo Tree Search (MCTS) has proven effective in solving decision-making problems in perfect information settings. However, its application to stochastic and imperfect information domains remains limited. This paper extends the theoretical framework of MCTS to stochastic domains by addressing non-deterministic state transitions, where actions lead to probabilistic outcomes. Specifically, building on the work of Shah et al. (2020), we derive polynomial regret concentration bounds for the Upper Confidence Bound algorithm in multi-armed bandit problems with stochastic transitions, offering improved theoretical guarantees. Our primary contribution is proving that these bounds also apply to non-deterministic environments, ensuring robust performance in stochastic settings. This broadens the applicability of MCTS to real-world decision-making problems with probabilistic outcomes, such as in autonomous systems and financial decision-making.
Problem

Research questions and friction points this paper is trying to address.

Extends MCTS to stochastic domains
Proves polynomial regret bounds
Applies to non-deterministic environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Monte Carlo Tree Search
Derives polynomial regret bounds
Applies to stochastic environments
🔎 Similar Papers
No similar papers found.
C
Can Comer
Computer Science Department, Technical University Darmstadt, Hessian Center for Artificial Intelligence (hessian.AI)
J
Jannis Bluml
Computer Science Department, Technical University Darmstadt, Hessian Center for Artificial Intelligence (hessian.AI)
Cedric Derstroff
Cedric Derstroff
PhD student, Technische Universität Darmstadt
Reinforcement Learning
Kristian Kersting
Kristian Kersting
Professor of AI & ML, Technical University of Darmstadt, Hessian.ai, DFKI, CAIRNE/ELLIS, AAAI Fellow
Artificial IntelligenceNeurosymbolic AIProbabilistic CircuitsMachine Learning