A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pok'emon

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
针对AI智能体在多样化团队策略中泛化能力不足的问题,提出VGC-Bench基准,采用语言模型、强化学习等方法评估策略,发现跨团队泛化仍是挑战。

Technology Category

Application Category

📝 Abstract
Developing AI agents that can robustly adapt to dramatically different strategic landscapes without retraining is a central challenge for multi-agent learning. Pok'emon Video Game Championships (VGC) is a domain with an extraordinarily large space of possible team configurations of approximately $10^{139}$ - far larger than those of Dota or Starcraft. The highly discrete, combinatorial nature of team building in Pok'emon VGC causes optimal strategies to shift dramatically depending on both the team being piloted and the opponent's team, making generalization uniquely challenging. To advance research on this problem, we introduce VGC-Bench: a benchmark that provides critical infrastructure, standardizes evaluation protocols, and supplies human-play datasets and a range of baselines - from large-language-model agents and behavior cloning to reinforcement learning and empirical game-theoretic methods such as self-play, fictitious play, and double oracle. In the restricted setting where an agent is trained and evaluated on a single-team configuration, our methods are able to win against a professional VGC competitor. We extensively evaluated all baseline methods over progressively larger team sets and find that even the best-performing algorithm in the single-team setting struggles at scaling up as team size grows. Thus, policy generalization across diverse team strategies remains an open challenge for the community. Our code is open sourced at https://github.com/cameronangliss/VGC-Bench.
Problem

Research questions and friction points this paper is trying to address.

Developing AI agents adaptable to diverse strategic landscapes
Generalizing across large combinatorial team configurations
Scaling AI performance with increasing team strategy diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces VGC-Bench benchmark for diverse strategies
Uses LLM agents, behavior cloning, reinforcement learning
Tests self-play, fictitious play, double oracle methods
🔎 Similar Papers
No similar papers found.
C
Cameron Angliss
Dept. of Computer Science, University of Texas at Austin
Jiaxun Cui
Jiaxun Cui
The University of Texas at Austin
Reinforcement LearningMulti-agent LearningGame Theory
Jiaheng Hu
Jiaheng Hu
UT-Austin
Robot LearningReinforcement LearningRoboticsMobile Manipulation
A
Arrasy Rahman
Dept. of Computer Science, University of Texas at Austin
P
Peter Stone
Dept. of Computer Science, University of Texas at Austin