Leveraging LLM Agents for Automated Video Game Testing

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low state coverage, weak long-horizon task reasoning, and difficulty in detecting deep logical defects in MMORPG automated testing, this paper proposes TITAN—a large language model (LLM)-based intelligent test agent. TITAN integrates action trajectory memory with a reflective self-correction mechanism to enable high-dimensional game state abstraction and long-horizon decision-making. It introduces an LLM-powered裁判 module serving as an interpretable oracle for diagnosing deep logical bugs. Further, it constructs an end-to-end test pipeline combining state-awareness, action optimization via ranking, and memory-augmented reasoning. Evaluated on two large commercial MMORPGs, TITAN achieves a 95% task completion rate, uncovers four critical functional bugs previously missed by existing tools, and has been deployed in eight real-world game quality assurance workflows—demonstrating both technical efficacy and engineering practicality.

Technology Category

Application Category

📝 Abstract
Testing MMORPGs (Massively Multiplayer Online Role-Playing Games) is a critical yet labor-intensive task in game development due to their complexity and frequent updating nature. Traditional automated game testing approaches struggle to achieve high state coverage and efficiency in these rich, open-ended environments, while existing LLM-based game-playing approaches are limited to shallow reasoning ability in understanding complex game state-action spaces and long-complex tasks. To address these challenges, we propose TITAN, an effective LLM-driven agent framework for intelligent MMORPG testing. TITAN incorporates four key components to: (1) perceive and abstract high-dimensional game states, (2) proactively optimize and prioritize available actions, (3) enable long-horizon reasoning with action trace memory and reflective self-correction, and (4) employ LLM-based oracles to detect potential functional and logic bugs with diagnostic reports. We implement the prototype of TITAN and evaluate it on two large-scale commercial MMORPGs spanning both PC and mobile platforms. In our experiments, TITAN achieves significantly higher task completion rates (95%) and bug detection performance compared to existing automated game testing approaches. An ablation study further demonstrates that each core component of TITAN contributes substantially to its overall performance. Notably, TITAN detects four previously unknown bugs that prior testing approaches fail to identify. We provide an in-depth discussion of these results, which offer guidance for new avenues of advancing intelligent, general-purpose testing systems. Moreover, TITAN has been deployed in eight real-world game QA pipelines, underscoring its practical impact as an LLM-driven game testing framework.
Problem

Research questions and friction points this paper is trying to address.

Automating labor-intensive MMORPG testing with LLM agents
Enhancing state coverage and efficiency in complex game environments
Detecting functional and logic bugs through intelligent reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM agent framework for automated game testing
Perceives and abstracts high-dimensional game states
Enables long-horizon reasoning with self-correction
Employs LLM-based oracles for bug detection
🔎 Similar Papers
No similar papers found.