🤖 AI Summary
To address prevalent syntax errors and low fidelity to game rules in natural language (NL)–to–Game Description Language (GDL) generation, this paper proposes a two-stage fine-tuning paradigm: first supervised fine-tuning (SFT) on large language models (LLMs), followed by dual-objective reinforcement learning (RL). We innovatively design jointly optimized rewards—(i) a syntax reward driven by a formal parser to enforce structural correctness, and (ii) a game-concept reward computed by a semantic alignment scorer to preserve rule semantics—both optimized concurrently via Proximal Policy Optimization (PPO). Evaluated across multiple GDL benchmarks, our method significantly outperforms the SFT-only baseline: syntax correctness improves by 27%, and fidelity to critical game concepts increases by 31%. To our knowledge, this is the first RL framework for NL→GDL generation that explicitly integrates structured syntactic constraints with deep semantic alignment.
📝 Abstract
Game Description Generation (GDG) is the task of generating a game description written in a Game Description Language (GDL) from natural language text. Previous studies have explored generation methods leveraging the contextual understanding capabilities of Large Language Models (LLMs); however, accurately reproducing the game features of the game descriptions remains a challenge. In this paper, we propose reinforcement learning-based fine-tuning of LLMs for GDG (RLGDG). Our training method simultaneously improves grammatical correctness and fidelity to game concepts by introducing both grammar rewards and concept rewards. Furthermore, we adopt a two-stage training strategy where Reinforcement Learning (RL) is applied following Supervised Fine-Tuning (SFT). Experimental results demonstrate that our proposed method significantly outperforms baseline methods using SFT alone.