🤖 AI Summary
Current vision-language models (VLMs) lack standardized, task-specific benchmarks for video game quality assurance (QA), hindering rigorous evaluation of their practical capabilities. To address this, we introduce GameQABench—the first multi-task VLM benchmark dedicated to game QA—systematically defining and quantifying core competencies: visual unit testing, regression testing, anomaly detection, multimodal (image/video–text) defect localization, and bug report generation. The benchmark features fine-grained annotations derived from real game screenshots and screen recordings, supporting multimodal inputs and cross-game generalization. We propose a unified evaluation framework integrating visual understanding, cross-modal alignment, and defect-driven language generation. Empirical analysis reveals critical bottlenecks of state-of-the-art VLMs in game QA tasks. All code and data are publicly released to foster collaborative advancement of automated game testing in both industry and academia.
📝 Abstract
With video games now generating the highest revenues in the entertainment industry, optimizing game development workflows has become essential for the sector's sustained growth. Recent advancements in Vision-Language Models (VLMs) offer considerable potential to automate and enhance various aspects of game development, particularly Quality Assurance (QA), which remains one of the industry's most labor-intensive processes with limited automation options. To accurately evaluate the performance of VLMs in video game QA tasks and determine their effectiveness in handling real-world scenarios, there is a clear need for standardized benchmarks, as existing benchmarks are insufficient to address the specific requirements of this domain. To bridge this gap, we introduce VideoGameQA-Bench, a comprehensive benchmark that covers a wide array of game QA activities, including visual unit testing, visual regression testing, needle-in-a-haystack tasks, glitch detection, and bug report generation for both images and videos of various games. Code and data are available at: https://asgaardlab.github.io/videogameqa-bench/