🤖 AI Summary
Automating end-to-end procedural game design—particularly for constrained domain-specific languages like PuzzleScript—remains challenging due to stringent syntactic and semantic requirements, necessitating human-in-the-loop validation and iteration.
Method: We propose an LLM-engine closed-loop paradigm: (1) an LLM generates initial game logic from few-shot examples; (2) a compiler provides targeted feedback to iteratively repair syntax and logical errors; and (3) a tree-search agent autonomously conducts gameplay simulation and solvability verification.
Contribution/Results: This is the first fully automated, human-free pipeline integrating concept generation, code correction, and playability assessment for puzzle games. Evaluated on PuzzleScript, our approach produces numerous syntactically valid, logically solvable, and novel maze games. It demonstrates that LLMs, when tightly coupled with domain-specific engines and search-based verification, can autonomously evolve functional game designs under strong formal constraints—advancing LLM capabilities in constrained program synthesis and creative AI.
📝 Abstract
There is much interest in using large pre-trained models in Automatic Game Design (AGD), whether via the generation of code, assets, or more abstract conceptualization of design ideas. But so far this interest largely stems from the ad hoc use of such generative models under persistent human supervision. Much work remains to show how these tools can be integrated into longer-time-horizon AGD pipelines, in which systems interface with game engines to test generated content autonomously. To this end, we introduce ScriptDoctor, a Large Language Model (LLM)-driven system for automatically generating and testing games in PuzzleScript, an expressive but highly constrained description language for turn-based puzzle games over 2D gridworlds. ScriptDoctor generates and tests game design ideas in an iterative loop, where human-authored examples are used to ground the system's output, compilation errors from the PuzzleScript engine are used to elicit functional code, and search-based agents play-test generated games. ScriptDoctor serves as a concrete example of the potential of automated, open-ended LLM-based workflows in generating novel game content.