Best Agent Identification for General Game Playing

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

277K/year

🤖 AI Summary

This paper addresses the challenge of efficiently identifying the optimal algorithmic agent across diverse subtasks in multi-task game environments. We propose a Best-Arm Identification (BAI) method grounded in the Multi-Armed Bandit (MAB) framework. Our key innovation is the Optimistic Wilson Score (Optimistic-WS) selection mechanism, which uniformly models each agent’s latent regret-reduction capability across tasks—eliminating task-specific hyperparameter tuning. By integrating the Wilson confidence lower bound with an optimistic sampling strategy, Optimistic-WS achieves significantly higher identification accuracy under limited evaluation budgets. Experiments on the GVGAI and Ludii benchmark platforms demonstrate that Optimistic-WS reduces average simple regret by 18.7%–32.4% compared to state-of-the-art BAI algorithms, while exhibiting superior task generalization and sample efficiency.

Technology Category

Application Category

📝 Abstract

We present an efficient and generalised procedure to accurately identify the best performing algorithm for each sub-task in a multi-problem domain. Our approach treats this as a set of best arm identification problems for multi-armed bandits, where each bandit corresponds to a specific task and each arm corresponds to a specific algorithm or agent. We propose an optimistic selection process based on the Wilson score interval (Optimistic-WS) that ranks each arm across all bandits in terms of their potential regret reduction. We evaluate the performance of Optimistic-WS on two of the most popular general game domains, the General Video Game AI (GVGAI) framework and the Ludii general game playing system, with the goal of identifying the highest performing agent for each game within a limited number of trials. Compared to previous best arm identification algorithms for multi-armed bandits, our results demonstrate a substantial performance improvement in terms of average simple regret. This novel approach can be used to significantly improve the quality and accuracy of agent evaluation procedures for general game frameworks, as well as other multi-task domains with high algorithm runtimes.

Problem

Research questions and friction points this paper is trying to address.

Identify best algorithm for sub-tasks in multi-problem domains

Rank algorithms by potential regret reduction using Optimistic-WS

Improve agent evaluation accuracy in general game frameworks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multi-armed bandits for best arm identification

Applies Wilson score interval for optimistic selection

Evaluates agents in general game domains efficiently

🔎 Similar Papers

Learning Strategy Representation for Imitation Learning in Multi-Agent Games