Doubly Robust Monte Carlo Tree Search

πŸ“… 2025-02-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the low sample efficiency and poor decision quality of Monte Carlo Tree Search (MCTS) in complex environments. To this end, it introduces Doubly Robust (DR) off-policy estimation into the MCTS framework for the first time, yielding a hybrid evaluator that simultaneously guarantees unbiasedness and variance reduction. The proposed DR-MCTS achieves substantial cross-model-scale improvements in sample efficiency under partial observability. Experiments demonstrate: (i) an 88% win rate in Tic-Tac-Toeβ€”78 percentage points higher than standard MCTS; (ii) a 20.7% success rate on composite tasks in VirtualHome, more than doubling the baseline; and (iii) superior performance of small models over large models using standard MCTS. The core contribution is a theoretically grounded DR-MCTS architecture that significantly enhances both policy evaluation accuracy and data utilization efficiency.

Technology Category

Application Category

πŸ“ Abstract
We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS) to enhance sample efficiency and decision quality in complex environments. Our approach introduces a hybrid estimator that combines MCTS rollouts with DR estimation, offering theoretical guarantees of unbiasedness and variance reduction under specified conditions. Empirical evaluations in Tic-Tac-Toe and the partially observable VirtualHome environment demonstrate DR-MCTS's superior performance over standard MCTS. In Tic-Tac-Toe, DR-MCTS achieves an 88% win rate compared to a 10% win rate for standard MCTS. In compound VirtualHome tasks, DR-MCTS attains a 20.7% success rate versus 10.3% for standard MCTS. Our scaling analysis reveals that DR-MCTS exhibits better sample efficiency, notably outperforming standard MCTS with larger language models while using a smaller model. These results underscore DR-MCTS's potential for efficient decision-making in complex, real-world scenarios where sample efficiency is paramount.
Problem

Research questions and friction points this paper is trying to address.

Enhances sample efficiency in complex environments
Improves decision quality using hybrid estimator
Outperforms standard MCTS in empirical evaluations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Doubly Robust integration
Hybrid estimator introduction
Enhanced sample efficiency