MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the challenge of extending AlphaZero to imperfect-information games, where direct application is hindered by policy fusion issues in existing approaches like Perfect Information Monte Carlo (PIMC) and excessive computational costs when integrating Information Set Monte Carlo Tree Search (IS-MCTS) with neural networks. The authors propose MAPLE, a method that aggregates policy and value estimates from multiple sampled world states within a single search tree, thereby combining the strengths of both paradigms. To enhance efficiency, MAPLE incorporates a Siamese network–guided, information-aware sampling mechanism that prioritizes critical world states. This approach effectively mitigates policy fusion while maintaining tractable computation. Experimental results demonstrate substantial performance gains, with MAPLE achieving Elo rating improvements of 291 on Phantom Go and 136 on Dark Hex, significantly outperforming PIMC baselines.

📝 Abstract

Imperfect-information games (IIGs) are challenging, as players must make decisions without fully observing the true game state. While AlphaZero has achieved remarkable success in perfect-information games, extending it to IIGs remains difficult. Existing search-based approaches, such as Perfect Information Monte Carlo (PIMC), suffer from strategy fusion, while Information Set Monte Carlo Tree Search (IS-MCTS) incurs high computational cost when combined with neural networks. In this paper, we propose Multi-State Aggregated PoLicy Evaluation (MAPLE), a tree search method that aggregates policy and value evaluations from multiple sampled world states within a single search tree, combining the advantages of PIMC and IS-MCTS while maintaining a controllable computational cost. We further incorporate a Siamese-based sampling strategy to select informative world states from the information set. Experiments on Phantom Go and Dark Hex show that MAPLE significantly outperforms the PIMC-based AlphaZero baseline, achieving Elo improvements of 291 and 136, respectively. These results demonstrate that MAPLE is an effective approach for AlphaZero-style learning in imperfect-information games.

Problem

Research questions and friction points this paper is trying to address.

Imperfect-information games

AlphaZero

Policy evaluation

Monte Carlo Tree Search

Strategy fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

imperfect-information games

AlphaZero

policy aggregation