Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

πŸ“… 2026-03-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the growing threat of synthetic videos being exploited for disinformation, a challenge exacerbated by limitations in current detection approaches that either overlook temporal cues or rely on supervised training, thereby compromising generalization. To overcome these issues, we propose STALLβ€”a training-free, model-agnostic, zero-shot method for video authenticity verification. STALL leverages a spatio-temporal joint probability model grounded in the statistical properties of real-world data, integrating spatial and temporal evidence through a theoretically justified likelihood scoring mechanism to effectively discriminate generated videos. Evaluated on two established benchmarks and a newly introduced dataset, ComGenVid, STALL substantially outperforms existing image- and video-based detection baselines, marking the first demonstration of efficient and generalizable zero-shot detection of AI-generated videos.

Technology Category

Application Category

πŸ“ Abstract
Following major advances in text and image generation, the video domain has surged, producing highly realistic and controllable sequences. Along with this progress, these models also raise serious concerns about misinformation, making reliable detection of synthetic videos increasingly crucial. Image-based detectors are fundamentally limited because they operate per frame and ignore temporal dynamics, while supervised video detectors generalize poorly to unseen generators, a critical drawback given the rapid emergence of new models. These challenges motivate zero-shot approaches, which avoid synthetic data and instead score content against real-data statistics, enabling training-free, model-agnostic detection. We introduce \emph{STALL}, a simple, training-free, theoretically justified detector that provides likelihood-based scoring for videos, jointly modeling spatial and temporal evidence within a probabilistic framework. We evaluate STALL on two public benchmarks and introduce ComGenVid, a new benchmark with state-of-the-art generative models. STALL consistently outperforms prior image- and video-based baselines. Code and data are available at https://omerbenhayun.github.io/stall-video.
Problem

Research questions and friction points this paper is trying to address.

generated video detection
zero-shot detection
temporal dynamics
model-agnostic
misinformation
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free
zero-shot detection
spatial-temporal likelihood
model-agnostic
deepfake detection
πŸ”Ž Similar Papers