Gaussian Process Aggregation for Root-Parallel Monte Carlo Tree Search with Continuous Actions

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

249K/year

🤖 AI Summary

To address the challenge of cross-thread statistical aggregation in root-parallel Monte Carlo tree search (MCTS) for continuous action spaces, this paper introduces Gaussian process regression (GPR) into the root-parallel MCTS framework for the first time. We propose a GPR-based value estimation method that enables reliable value extrapolation for unsampled actions and explicitly models action-space continuity. The approach preserves online planning efficiency while effectively fusing local statistics from multiple threads. Evaluated on six standard continuous control benchmarks, it significantly outperforms existing aggregation strategies—including weighted averaging and max-value aggregation—yielding substantial improvements in policy quality and planning stability, with only marginal increases in inference overhead. Our core contribution is the novel application of GPR to cross-thread value aggregation in root-parallel MCTS, establishing a new paradigm for efficient online planning in continuous action domains.

Technology Category

Application Category

📝 Abstract

Monte Carlo Tree Search is a cornerstone algorithm for online planning, and its root-parallel variant is widely used when wall clock time is limited but best performance is desired. In environments with continuous action spaces, how to best aggregate statistics from different threads is an important yet underexplored question. In this work, we introduce a method that uses Gaussian Process Regression to obtain value estimates for promising actions that were not trialed in the environment. We perform a systematic evaluation across 6 different domains, demonstrating that our approach outperforms existing aggregation strategies while requiring a modest increase in inference time.

Problem

Research questions and friction points this paper is trying to address.

Aggregates statistics from parallel threads in continuous action spaces

Uses Gaussian Process Regression for untried action value estimation

Outperforms existing methods across six domains with minimal time increase

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian Process Regression for value estimation

Aggregates untried actions across parallel threads

Outperforms existing strategies with minimal time increase

🔎 Similar Papers

No similar papers found.