🤖 AI Summary
This study addresses the challenge of reliably translating complex game designs into executable Unity code while preserving the intended gameplay semantics. To this end, the authors propose a method that leverages a human-authored Unity-specific intermediate representation (IR) to guide large language models—such as DeepSeek-Coder-V2 and Qwen2.5-Coder—in generating structured game code. The work presents the first systematic evaluation of the feasibility of using LLMs to produce compilable Unity games under explicit gameplay constraints. Experimental results demonstrate that the IR significantly improves compilation success rates, while also revealing “structural grounding failure” as a primary bottleneck. These findings highlight a critical limitation in current LLM capabilities and point toward a promising new direction for research in machine creativity and game development automation.
📝 Abstract
Creatively translating complex gameplay ideas into executable artifacts (e.g., games as Unity projects and code) remains a central challenge in computational game creativity. Gameplay design patterns provide a structured representation for describing gameplay phenomena, enabling designers to decompose high-level ideas into entities, constraints, and rule-driven dynamics. Among them, goal patterns formalize common player-objective relationships. Goal Playable Concepts (GPCs) operationalize these abstractions as playable Unity engine implementations, supporting experiential exploration and compositional gameplay design. We frame scalable playable pattern realization as a problem of constrained executable creative synthesis: generated artifacts must satisfy Unity's syntactic and architectural requirements while preserving the semantic gameplay meanings encoded in goal patterns. This dual constraint limits scalability. Therefore, we investigate whether contemporary large language models (LLMs) can perform such synthesis under engine-level structural constraints and generate Unity code (as games) structured and conditioned by goal playable patterns. Using 26 goal pattern instantiations, we compare a direct generation baseline (natural language ->C# ->Unity) with pipelines conditioned on a human-authored Unity-specific intermediate representation (IR), across three IR configurations and two open-source models (DeepSeek-Coder-V2-Lite-Instruct and Qwen2.5-Coder-7B-Instruct). Compilation success is evaluated via automated Unity replay. We propose grounding and hygiene failure modes, identifying structural and project-level grounding as primary bottlenecks.