🤖 AI Summary
To address the challenge of cross-thread statistical aggregation in root-parallel Monte Carlo tree search (MCTS) for continuous action spaces, this paper introduces Gaussian process regression (GPR) into the root-parallel MCTS framework for the first time. We propose a GPR-based value estimation method that enables reliable value extrapolation for unsampled actions and explicitly models action-space continuity. The approach preserves online planning efficiency while effectively fusing local statistics from multiple threads. Evaluated on six standard continuous control benchmarks, it significantly outperforms existing aggregation strategies—including weighted averaging and max-value aggregation—yielding substantial improvements in policy quality and planning stability, with only marginal increases in inference overhead. Our core contribution is the novel application of GPR to cross-thread value aggregation in root-parallel MCTS, establishing a new paradigm for efficient online planning in continuous action domains.
📝 Abstract
Monte Carlo Tree Search is a cornerstone algorithm for online planning, and its root-parallel variant is widely used when wall clock time is limited but best performance is desired. In environments with continuous action spaces, how to best aggregate statistics from different threads is an important yet underexplored question. In this work, we introduce a method that uses Gaussian Process Regression to obtain value estimates for promising actions that were not trialed in the environment. We perform a systematic evaluation across 6 different domains, demonstrating that our approach outperforms existing aggregation strategies while requiring a modest increase in inference time.