GUI Agents for Continual Game Generation

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses the limitation of existing game generation methods, which typically produce code in a single pass and struggle to identify playability issues at the interaction level. To overcome this, the authors propose Play2Code, a continuous generation framework integrated with PlaytestArena, an evaluation environment that, for the first time, deeply incorporates browser-based GUI agents into the game creation pipeline. This integration establishes a closed loop of “generate–play–feedback,” leveraging a shared memory mechanism, behavioral scoring rules, and interactive code generation to substantially enhance game playability. Experimental results demonstrate that Play2Code achieves a rule compliance rate of 66.8%, representing improvements of 37.1 and 14.6 percentage points over one-shot generation and state-of-the-art agent-based coding baselines, respectively.

📝 Abstract

Generating a game is not the same as making one that can be played. Despite advances in code generation, existing approaches treat game generation as one-shot translation from prompt to artifact, leaving interaction-level failures undetected. We argue that evaluating and improving game generation requires a player, and study two roles for graphical user interface (GUI) agents in this process: (1) as an objective evaluator, for which we introduce PlaytestArena, a new evaluation environment that pairs 200 browser-based game generation tasks across eight genres with rubrics of expected in-play behaviors, adjudicated by a GUI agent that loads each build in a browser and plays it; and (2) as a subjective playtester, for which we propose Play2Code, where a game agent and a GUI agent operate in a sustained loop with shared memory, turning game generation into a dialogue between coding and playing. Our experiments show that even frontier models struggle to generate playable games directly, while Play2Code achieves a 66.8\% rubric pass-rate, improving over single-pass and agentic-coding baselines by 37.1 and 14.6 points respectively. Further analysis shows that GUI playtester feedback is more traceable than a human report, yet idiosyncratic in ways reminiscent of human testers, establishing game playtesting as a critical testbed for interactive code generation. Our project website is available at https://continual-game-generation.vercel.app/.

Problem

Research questions and friction points this paper is trying to address.

game generation

playability

GUI agents

interactive code generation

playtesting

Innovation

Methods, ideas, or system contributions that make the work stand out.

GUI agents

continual game generation

PlaytestArena