VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Current vision-language models (VLMs) lack standardized, task-specific benchmarks for video game quality assurance (QA), hindering rigorous evaluation of their practical capabilities. To address this, we introduce GameQABench—the first multi-task VLM benchmark dedicated to game QA—systematically defining and quantifying core competencies: visual unit testing, regression testing, anomaly detection, multimodal (image/video–text) defect localization, and bug report generation. The benchmark features fine-grained annotations derived from real game screenshots and screen recordings, supporting multimodal inputs and cross-game generalization. We propose a unified evaluation framework integrating visual understanding, cross-modal alignment, and defect-driven language generation. Empirical analysis reveals critical bottlenecks of state-of-the-art VLMs in game QA tasks. All code and data are publicly released to foster collaborative advancement of automated game testing in both industry and academia.

Technology Category

Application Category

📝 Abstract

With video games now generating the highest revenues in the entertainment industry, optimizing game development workflows has become essential for the sector's sustained growth. Recent advancements in Vision-Language Models (VLMs) offer considerable potential to automate and enhance various aspects of game development, particularly Quality Assurance (QA), which remains one of the industry's most labor-intensive processes with limited automation options. To accurately evaluate the performance of VLMs in video game QA tasks and determine their effectiveness in handling real-world scenarios, there is a clear need for standardized benchmarks, as existing benchmarks are insufficient to address the specific requirements of this domain. To bridge this gap, we introduce VideoGameQA-Bench, a comprehensive benchmark that covers a wide array of game QA activities, including visual unit testing, visual regression testing, needle-in-a-haystack tasks, glitch detection, and bug report generation for both images and videos of various games. Code and data are available at: https://asgaardlab.github.io/videogameqa-bench/

Problem

Research questions and friction points this paper is trying to address.

Evaluating VLMs for automating video game QA tasks

Addressing lack of standardized benchmarks for game QA

Assessing VLMs' effectiveness in real-world game scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces VideoGameQA-Bench for game QA evaluation

Covers diverse QA tasks like glitch detection

Uses Vision-Language Models to automate QA processes

🔎 Similar Papers

No similar papers found.