TempGlitch: Evaluating Vision-Language Models for Temporal Glitch Detection in Gameplay Videos

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the limitation of current vision-language models in detecting dynamic glitches in gameplay videos, which often rely on temporal inconsistencies across frames rather than static spatial anomalies. To bridge this gap, we introduce TempGlitch—the first controllable video benchmark specifically designed for temporal game glitches—featuring five distinct categories of time-dependent anomalies alongside paired glitch-free videos to enable binary classification evaluation. Leveraging a multi-frame sampling strategy and balanced class design, we establish a robust evaluation framework and conduct systematic assessments across twelve state-of-the-art vision-language models. Our findings reveal that existing models perform near random chance, with minimal gains from dense frame sampling or increased model scale, highlighting a fundamental deficiency in their temporal reasoning capabilities and filling a critical void in benchmarking for this task.

📝 Abstract

Vision-language models (VLMs) are increasingly being explored for video game quality assurance, especially gameplay glitch detection. Most existing evaluations, however, treat glitches as static visual anomalies, asking models to detect failures from a single frame. We argue that this framing misses a key distinction: some glitches are spatial and visible in an isolated frame, whereas others are temporal and become evident only through changes across ordered frames. A preliminary study confirms this gap, showing that temporal glitches are substantially harder for VLMs to detect than spatial ones. To enable systematic evaluation of this underexplored setting, we introduce TempGlitch, a controlled gameplay video benchmark for temporal glitch detection. TempGlitch covers five temporal glitch types with balanced per-category samples, together with paired glitch-free videos that enable reliable binary evaluation. We evaluate 12 proprietary and open-weight VLMs across multiple frame-sampling settings. Our results show that current VLMs remain near chance on TempGlitch, often collapsing into either overly conservative behavior that misses most glitches or overly sensitive behavior that flags clean videos as glitchy. Moreover, denser frame sampling and larger model size do not reliably resolve these failures. TempGlitch provides a focused testbed for temporal reasoning, robust gameplay understanding, and automated glitch detection with VLMs. Code and data are available at the project website.

Problem

Research questions and friction points this paper is trying to address.

temporal glitch detection

vision-language models

gameplay videos

video anomaly detection

temporal reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Glitch Detection

Vision-Language Models

Gameplay Video Benchmark