🤖 AI Summary
This work addresses the challenge that existing large language models struggle to provide constructive, experience-grounded feedback for tabletop game design, thereby limiting human-AI collaborative creation. The authors propose MeepleLM, the first framework to integrate modeling of player subjective heterogeneity with the Mechanics–Dynamics–Aesthetics (MDA) framework, enabling the inference of gameplay experience and generation of personalized critiques tailored to distinct player types solely from rulebooks. Their approach encompasses a high-quality rulebook–review dataset, structured rulebook refinement, dimension-aware critique sampling, MDA-informed reasoning enhancement, and player persona distillation via fine-tuning. Experiments demonstrate that MeepleLM significantly outperforms GPT-5.1 and Gemini3-Pro in both community alignment and critique quality, with user studies confirming its efficacy as a virtual playtester at a 70% preference rate.
📝 Abstract
Recent advancements have expanded the role of Large Language Models in board games from playing agents to creative co-designers. However, a critical gap remains: current systems lack the capacity to offer constructive critique grounded in the emergent user experience. Bridging this gap is fundamental for harmonizing Human-AI collaboration, as it empowers designers to refine their creations via external perspectives while steering models away from biased or unpredictable outcomes. Automating critique for board games presents two challenges: inferring the latent dynamics connecting rules to gameplay without an explicit engine, and modeling the subjective heterogeneity of diverse player groups. To address these, we curate a dataset of 1,727 structurally corrected rulebooks and 150K reviews selected via quality scoring and facet-aware sampling. We augment this data with Mechanics-Dynamics-Aesthetics (MDA) reasoning to explicitly bridge the causal gap between written rules and player experience. We further distill player personas and introduce MeepleLM, a specialized model that internalizes persona-specific reasoning patterns to accurately simulate the subjective feedback of diverse player archetypes. Experiments demonstrate that MeepleLM significantly outperforms latest commercial models (e.g., GPT-5.1, Gemini3-Pro) in community alignment and critique quality, achieving a 70% preference rate in user studies assessing utility. MeepleLM serves as a reliable virtual playtester for general interactive systems, marking a pivotal step towards audience-aligned, experience-aware Human-AI collaboration.