🤖 AI Summary
Self-play in reinforcement learning (RL) lacks a unified theoretical foundation and systematic taxonomy, particularly across multi-agent RL and game-theoretic contexts, hindering progress in non-transitive games, sample efficiency, scalability, and convergence guarantees.
Method: We present the first comprehensive survey of self-play methods, establishing a unified classification framework grounded in three orthogonal dimensions: policy update mechanisms, opponent modeling strategies, and equilibrium-seeking principles. We construct a knowledge graph spanning classical to state-of-the-art algorithms and explicitly link design choices to real-world applications—including Go, poker, and large language model alignment.
Contribution/Results: Our analysis identifies key open challenges in modeling non-transitivity, improving sample efficiency and scalability, and establishing rigorous convergence properties. We propose a forward-looking research agenda centered on cooperative and adversarial general-purpose agents. This work provides the community with a canonical paradigm, authoritative reference, and principled technology roadmap for self-play research.
📝 Abstract
Self-play, characterized by agents' interactions with copies or past versions of themselves, has recently gained prominence in reinforcement learning (RL). This paper first clarifies the preliminaries of self-play, including the multi-agent reinforcement learning framework and basic game theory concepts. Then, it provides a unified framework and classifies existing self-play algorithms within this framework. Moreover, the paper bridges the gap between the algorithms and their practical implications by illustrating the role of self-play in different scenarios. Finally, the survey highlights open challenges and future research directions in self-play. This paper is an essential guide map for understanding the multifaceted landscape of self-play in RL.