Diamonds in the rough: Transforming SPARCs of imagination into a game concept by leveraging medium sized LLMs

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the feasibility and effectiveness of medium-scale large language models (LLMs) in supporting early-stage game concept design under resource-constrained environments—such as consumer-grade hardware—targeting indie studios and solo developers. Method: We propose a structured prompt engineering–based, multi-dimensional automated evaluation framework to systematically compare LLaMA-3.1, Qwen-2.5, and DeepSeek-R1 on game创意 quality assessment, including narrative coherence, mechanic novelty, and design feasibility. Contribution/Results: We provide the first empirical evidence that such LLMs can generate stable, constructive feedback on narrative and gameplay mechanics; DeepSeek-R1 achieves superior consistency and practical utility. In pedagogical and creative prototyping experiments, most students rated the feedback as high-quality and expressed willingness to integrate it into their workflows. Results confirm that lightweight LLMs are now capable of meaningfully supporting real-world game narrative design, offering a reusable methodology and empirical foundation for AI-augmented creative production.

Technology Category

Application Category

📝 Abstract
Recent research has demonstrated that large language models (LLMs) can support experts across various domains, including game design. In this study, we examine the utility of medium-sized LLMs, models that operate on consumer-grade hardware typically available in small studios or home environments. We began by identifying ten key aspects that contribute to a strong game concept and used ChatGPT to generate thirty sample game ideas. Three medium-sized LLMs, LLaMA 3.1, Qwen 2.5, and DeepSeek-R1, were then prompted to evaluate these ideas according to the previously identified aspects. A qualitative assessment by two researchers compared the models' outputs, revealing that DeepSeek-R1 produced the most consistently useful feedback, despite some variability in quality. To explore real-world applicability, we ran a pilot study with ten students enrolled in a storytelling course for game development. At the early stages of their own projects, students used our prompt and DeepSeek-R1 to refine their game concepts. The results indicate a positive reception: most participants rated the output as high quality and expressed interest in using such tools in their workflows. These findings suggest that current medium-sized LLMs can provide valuable feedback in early game design, though further refinement of prompting methods could improve consistency and overall effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Evaluating medium-sized LLMs for game concept feedback generation
Assessing LLM utility in early-stage game design processes
Comparing different medium-sized models on consumer hardware
Innovation

Methods, ideas, or system contributions that make the work stand out.

Medium-sized LLMs evaluate game concepts
DeepSeek-R1 provides consistent feedback quality
Prompt-based refinement enhances game design workflow
🔎 Similar Papers
No similar papers found.
J
Julian Geheeb
Technical University of Munich, Arcisstraße 21, 80333 Munich, Germany
F
Farhan Abid Ivan
Technical University of Munich, Arcisstraße 21, 80333 Munich, Germany
D
Daniel Dyrda
Technical University of Munich, Arcisstraße 21, 80333 Munich, Germany
Miriam Anschütz
Miriam Anschütz
PhD Student of Computer Science, Technical University of Munich
Natural language processingeasy-to-readtext simplification
Georg Groh
Georg Groh
Adjunct Professor
Social ComputingNatural Language Processing