🤖 AI Summary
This study investigates the feasibility and effectiveness of medium-scale large language models (LLMs) in supporting early-stage game concept design under resource-constrained environments—such as consumer-grade hardware—targeting indie studios and solo developers.
Method: We propose a structured prompt engineering–based, multi-dimensional automated evaluation framework to systematically compare LLaMA-3.1, Qwen-2.5, and DeepSeek-R1 on game创意 quality assessment, including narrative coherence, mechanic novelty, and design feasibility.
Contribution/Results: We provide the first empirical evidence that such LLMs can generate stable, constructive feedback on narrative and gameplay mechanics; DeepSeek-R1 achieves superior consistency and practical utility. In pedagogical and creative prototyping experiments, most students rated the feedback as high-quality and expressed willingness to integrate it into their workflows. Results confirm that lightweight LLMs are now capable of meaningfully supporting real-world game narrative design, offering a reusable methodology and empirical foundation for AI-augmented creative production.
📝 Abstract
Recent research has demonstrated that large language models (LLMs) can support experts across various domains, including game design. In this study, we examine the utility of medium-sized LLMs, models that operate on consumer-grade hardware typically available in small studios or home environments. We began by identifying ten key aspects that contribute to a strong game concept and used ChatGPT to generate thirty sample game ideas. Three medium-sized LLMs, LLaMA 3.1, Qwen 2.5, and DeepSeek-R1, were then prompted to evaluate these ideas according to the previously identified aspects. A qualitative assessment by two researchers compared the models' outputs, revealing that DeepSeek-R1 produced the most consistently useful feedback, despite some variability in quality. To explore real-world applicability, we ran a pilot study with ten students enrolled in a storytelling course for game development. At the early stages of their own projects, students used our prompt and DeepSeek-R1 to refine their game concepts. The results indicate a positive reception: most participants rated the output as high quality and expressed interest in using such tools in their workflows. These findings suggest that current medium-sized LLMs can provide valuable feedback in early game design, though further refinement of prompting methods could improve consistency and overall effectiveness.