Parallelizing Tree Search with Twice Sequential Monte Carlo

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

263K/year

🤖 AI Summary

In model-based reinforcement learning, Sequential Monte Carlo (SMC) tree search suffers from high variance, severe path degeneracy, and poor scalability as search depth increases. To address these issues, we propose Twice SMC Tree Search (TSMCTS), a novel algorithm featuring a dual sequential sampling and resampling mechanism. TSMCTS preserves the inherent parallelism of SMC while substantially mitigating variance accumulation and particle degeneracy—enabling, for the first time, stable asymptotic scalability of SMC-style algorithms with increasing search depth. By integrating importance reweighting and path optimization, and leveraging GPU-accelerated parallelism, TSMCTS achieves efficient large-scale tree search. Empirical evaluation demonstrates that TSMCTS consistently outperforms standard SMC and state-of-the-art MCTS variants on both discrete and continuous control benchmarks, simultaneously improving policy quality and search efficiency.

Technology Category

Application Category

📝 Abstract

Model-based reinforcement learning (RL) methods that leverage search are responsible for many milestone breakthroughs in RL. Sequential Monte Carlo (SMC) recently emerged as an alternative to the Monte Carlo Tree Search (MCTS) algorithm which drove these breakthroughs. SMC is easier to parallelize and more suitable to GPU acceleration. However, it also suffers from large variance and path degeneracy which prevent it from scaling well with increased search depth, i.e., increased sequential compute. To address these problems, we introduce Twice Sequential Monte Carlo Tree Search (TSMCTS). Across discrete and continuous environments TSMCTS outperforms the SMC baseline as well as a popular modern version of MCTS. Through variance reduction and mitigation of path degeneracy, TSMCTS scales favorably with sequential compute while retaining the properties that make SMC natural to parallelize.

Problem

Research questions and friction points this paper is trying to address.

Reduces variance in Sequential Monte Carlo tree search

Mitigates path degeneracy in reinforcement learning algorithms

Enhances scalability of parallelizable tree search methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Twice Sequential Monte Carlo Tree Search

Reduces variance and mitigates path degeneracy

Retains parallelization properties while scaling compute

🔎 Similar Papers

Separate Generation and Evaluation for Parallel Greedy Best-First Search