Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

📅 2023-02-01

🏛️ arXiv.org

📈 Citations: 11

✨ Influential: 1

career value

187K/year

🤖 AI Summary

In multi-agent games, opponent modeling often relies on domain-specific heuristics, and computing approximate optimal responses remains intractable in large-scale imperfect-information settings. Method: This paper extends the Policy-Space Response Oracles (PSRO) framework by integrating Monte Carlo Tree Search (MCTS), conditional generative world-state sampling, and Nash Bargaining Solution (NBS)-driven meta-strategy computation. It pioneers the incorporation of generative modeling into MARL search, introduces two novel NBS-based meta-strategy solvers, and enables online Bayesian co-player policy prediction. Results: Evaluated on Colored Trails and Deal or No Deal—two canonical imperfect-information negotiation benchmarks—the algorithm converges toward Nash equilibria. In human-subject experiments involving 346 participants, it achieves social welfare comparable to human–human negotiations, while significantly improving human–AI collaboration efficiency and fairness.

📝 Abstract

Multiagent reinforcement learning (MARL) has benefited significantly from population-based and game-theoretic training regimes. One approach, Policy-Space Response Oracles (PSRO), employs standard reinforcement learning to compute response policies via approximate best responses and combines them via meta-strategy selection. We augment PSRO by adding a novel search procedure with generative sampling of world states, and introduce two new meta-strategy solvers based on the Nash bargaining solution. We evaluate PSRO's ability to compute approximate Nash equilibrium, and its performance in two negotiation games: Colored Trails, and Deal or No Deal. We conduct behavioral studies where human participants negotiate with our agents ($N = 346$). We find that search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare negotiating with humans as humans trading among themselves.

Problem

Research questions and friction points this paper is trying to address.

Scalable opponent modeling in game-theoretic multiagent systems

Improving best response algorithms for imperfect information games

Enhancing human-agent negotiation outcomes using generative models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep generative model with MCTS for scalable best response

Policy Space Response Oracles for offline opponent modeling

Bargaining theory for opponent mixture near Pareto frontier

🔎 Similar Papers

No similar papers found.