🤖 AI Summary
This work addresses the problem of regret minimization across distributions of similar zero-sum games under self-play settings. Unlike conventional approaches that optimize each game in isolation or decompose regret locally, we propose the first meta-learning framework specifically designed for self-play, featuring a global state-wise information integration mechanism that enables cross-game policy transfer and holistic regret modeling. Our method synergistically combines offline meta-learning, self-play regret minimization, and cross-game policy transfer techniques. Empirical evaluation on normal-form games and Texas Hold’em river subgames demonstrates significant improvements over state-of-the-art algorithms—including CFR, DREAM, and PCFR+—in both equilibrium quality and convergence speed. The framework establishes a novel paradigm for coordinated strategy optimization in complex, dynamic multi-game environments.
📝 Abstract
Regret minimization is a general approach to online optimization which plays a crucial role in many algorithms for approximating Nash equilibria in two-player zero-sum games. The literature mainly focuses on solving individual games in isolation. However, in practice, players often encounter a distribution of similar but distinct games. For example, when trading correlated assets on the stock market, or when refining the strategy in subgames of a much larger game. Recently, offline meta-learning was used to accelerate one-sided equilibrium finding on such distributions. We build upon this, extending the framework to the more challenging self-play setting, which is the basis for most state-of-the-art equilibrium approximation algorithms for domains at scale. When selecting the strategy, our method uniquely integrates information across all decision states, promoting global communication as opposed to the traditional local regret decomposition. Empirical evaluation on normal-form games and river poker subgames shows our meta-learned algorithms considerably outperform other state-of-the-art regret minimization algorithms.