A Survey on Self-play Methods in Reinforcement Learning

📅 2024-08-02

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Self-play in reinforcement learning (RL) lacks a unified theoretical foundation and systematic taxonomy, particularly across multi-agent RL and game-theoretic contexts, hindering progress in non-transitive games, sample efficiency, scalability, and convergence guarantees. Method: We present the first comprehensive survey of self-play methods, establishing a unified classification framework grounded in three orthogonal dimensions: policy update mechanisms, opponent modeling strategies, and equilibrium-seeking principles. We construct a knowledge graph spanning classical to state-of-the-art algorithms and explicitly link design choices to real-world applications—including Go, poker, and large language model alignment. Contribution/Results: Our analysis identifies key open challenges in modeling non-transitivity, improving sample efficiency and scalability, and establishing rigorous convergence properties. We propose a forward-looking research agenda centered on cooperative and adversarial general-purpose agents. This work provides the community with a canonical paradigm, authoritative reference, and principled technology roadmap for self-play research.

Technology Category

Application Category

📝 Abstract

Self-play, characterized by agents' interactions with copies or past versions of themselves, has recently gained prominence in reinforcement learning (RL). This paper first clarifies the preliminaries of self-play, including the multi-agent reinforcement learning framework and basic game theory concepts. Then, it provides a unified framework and classifies existing self-play algorithms within this framework. Moreover, the paper bridges the gap between the algorithms and their practical implications by illustrating the role of self-play in different scenarios. Finally, the survey highlights open challenges and future research directions in self-play. This paper is an essential guide map for understanding the multifaceted landscape of self-play in RL.

Problem

Research questions and friction points this paper is trying to address.

Clarifies self-play preliminaries in reinforcement learning

Classifies existing self-play algorithms within unified framework

Highlights open challenges and future research in self-play

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for self-play algorithms

Classification of existing self-play methods

Illustration of self-play in practical scenarios

🔎 Similar Papers

Revealing the learning process in reinforcement learning agents through attention-oriented metrics