A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pok'emon

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

📄 PDF

career value

241K/year

🤖 AI Summary

针对AI智能体在多样化团队策略中泛化能力不足的问题，提出VGC-Bench基准，采用语言模型、强化学习等方法评估策略，发现跨团队泛化仍是挑战。

Technology Category

Application Category

📝 Abstract

Developing AI agents that can robustly adapt to dramatically different strategic landscapes without retraining is a central challenge for multi-agent learning. Pok'emon Video Game Championships (VGC) is a domain with an extraordinarily large space of possible team configurations of approximately $10^{139}$ - far larger than those of Dota or Starcraft. The highly discrete, combinatorial nature of team building in Pok'emon VGC causes optimal strategies to shift dramatically depending on both the team being piloted and the opponent's team, making generalization uniquely challenging. To advance research on this problem, we introduce VGC-Bench: a benchmark that provides critical infrastructure, standardizes evaluation protocols, and supplies human-play datasets and a range of baselines - from large-language-model agents and behavior cloning to reinforcement learning and empirical game-theoretic methods such as self-play, fictitious play, and double oracle. In the restricted setting where an agent is trained and evaluated on a single-team configuration, our methods are able to win against a professional VGC competitor. We extensively evaluated all baseline methods over progressively larger team sets and find that even the best-performing algorithm in the single-team setting struggles at scaling up as team size grows. Thus, policy generalization across diverse team strategies remains an open challenge for the community. Our code is open sourced at https://github.com/cameronangliss/VGC-Bench.

Problem

Research questions and friction points this paper is trying to address.

Developing AI agents adaptable to diverse strategic landscapes

Generalizing across large combinatorial team configurations

Scaling AI performance with increasing team strategy diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces VGC-Bench benchmark for diverse strategies

Uses LLM agents, behavior cloning, reinforcement learning

Tests self-play, fictitious play, double oracle methods

🔎 Similar Papers

No similar papers found.