Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the challenges of strategy search and high training costs in large-scale imperfect-information games, exemplified by Stratego. We propose a general self-play reinforcement learning framework tailored to hidden-information environments, integrating test-time Monte Carlo Tree Search (MCTS) with policy distillation. Methodologically, we employ a lightweight neural network architecture, efficient exploration strategies, and online opponent sampling to significantly reduce computational overhead. Crucially, without relying on any external human data, our approach achieves superhuman performance using only a few thousand dollars of compute—attaining over 35% higher win rate than top human players in standard 10×10 Stratego. To our knowledge, this is the first work to achieve stable, superhuman performance in a high-complexity imperfect-information game under low-cost training constraints. The framework establishes a scalable paradigm for strategic AI deployment in resource-constrained settings.

Technology Category

Application Category

📝 Abstract

Few classical games have been regarded as such significant benchmarks of artificial intelligence as to have justified training costs in the millions of dollars. Among these, Stratego -- a board wargame exemplifying the challenge of strategic decision making under massive amounts of hidden information -- stands apart as a case where such efforts failed to produce performance at the level of top humans. This work establishes a step change in both performance and cost for Stratego, showing that it is now possible not only to reach the level of top humans, but to achieve vastly superhuman level -- and that doing so requires not an industrial budget, but merely a few thousand dollars. We achieved this result by developing general approaches for self-play reinforcement learning and test-time search under imperfect information.

Problem

Research questions and friction points this paper is trying to address.

Developing superhuman AI for Stratego using reinforcement learning

Overcoming massive hidden information challenges in strategic games

Achieving high performance with minimal computational costs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-play reinforcement learning for Stratego

Test-time search under imperfect information

Achieving superhuman performance with minimal cost

🔎 Similar Papers

No similar papers found.