Generative Evolutionary Meta-Solver (GEMS): Scalable Surrogate-Free Multi-Agent Learning

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-agent reinforcement learning (MARL), population-based methods (e.g., PSRO) suffer from scalability bottlenecks—quadratic computational complexity and linear memory overhead—due to explicit policy population storage and full payoff matrix construction. To address this, we propose a generative evolutionary meta-solver: it implicitly represents the policy population via latent-variable anchors and a conditional generator, augmented with an adaptive expansion mechanism, thereby eliminating the need for explicit populations or payoff matrices while preserving theoretical Nash equilibrium convergence guarantees. Policy optimization integrates Monte Carlo rollouts, multiplicative-weights meta-dynamics, a model-free Bernstein UCB oracle, and an advantage-based trust-region objective. Experiments across diverse games demonstrate up to 6× speedup, 1.3× memory reduction, and significantly improved rewards—validating both efficiency and scalability.

Technology Category

Application Category

📝 Abstract
Scalable multi-agent reinforcement learning (MARL) remains a central challenge for AI. Existing population-based methods, like Policy-Space Response Oracles, PSRO, require storing explicit policy populations and constructing full payoff matrices, incurring quadratic computation and linear memory costs. We present Generative Evolutionary Meta-Solver (GEMS), a surrogate-free framework that replaces explicit populations with a compact set of latent anchors and a single amortized generator. Instead of exhaustively constructing the payoff matrix, GEMS relies on unbiased Monte Carlo rollouts, multiplicative-weights meta-dynamics, and a model-free empirical-Bernstein UCB oracle to adaptively expand the policy set. Best responses are trained within the generator using an advantage-based trust-region objective, eliminating the need to store and train separate actors. We evaluated GEMS in a variety of Two-player and Multi-Player games such as the Deceptive Messages Game, Kuhn Poker and Multi-Particle environment. We find that GEMS is up to ~6x faster, has 1.3x less memory usage than PSRO, while also reaps higher rewards simultaneously. These results demonstrate that GEMS retains the game theoretic guarantees of PSRO, while overcoming its fundamental inefficiencies, hence enabling scalable multi-agent learning in multiple domains.
Problem

Research questions and friction points this paper is trying to address.

Overcoming quadratic computation costs in multi-agent reinforcement learning
Eliminating linear memory requirements of population-based methods like PSRO
Enabling scalable surrogate-free learning with compact latent representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Replaces explicit populations with latent anchors and generator
Uses Monte Carlo rollouts and meta-dynamics for payoff estimation
Trains best responses via advantage-based trust-region objective
🔎 Similar Papers
No similar papers found.
A
Alakh Sharma
Birla Institute of Technology and Science, Pilani, Pilani, Rajasthan (333031)
G
Gaurish Trivedi
Birla Institute of Technology and Science, Pilani, Pilani, Rajasthan (333031)
K
Kartikey Bhandari
Birla Institute of Technology and Science, Pilani, Pilani, Rajasthan (333031)
Yash Sinha
Yash Sinha
National University of Singapore
Machine UnlearningSoftware Defined Networks
D
Dhruv Kumar
Birla Institute of Technology and Science, Pilani, Pilani, Rajasthan (333031)
P
Pratik Narang
Birla Institute of Technology and Science, Pilani, Pilani, Rajasthan (333031)
Jagat Sesh Challa
Jagat Sesh Challa
Assistant Professor, Department of Computer Science & Information Systems, BITS Pilani
Big Data AnalyticsComputer VisionFederated LearningMaterials InformaticsHCI