🤖 AI Summary
Addressing the dual challenges of lacking benchmark datasets and generalizable methods for AI-generated video detection, this paper introduces DeCoF, a novel detection framework grounded in frame-level temporal consistency modeling. To support systematic evaluation, we first construct an open-source video forgery dataset encompassing diverse generative models and real-world scenarios. DeCoF deliberately avoids reliance on spatial artifacts—often fragile under compression or post-processing—and instead focuses on temporal inconsistencies. It integrates prompt-driven synthetic data construction, explicit temporal feature modeling, and artifact-free representation learning to achieve cross-model generalization. Evaluated on unseen generative models, DeCoF achieves a mean detection accuracy of 92.7%. Notably, it demonstrates strong robustness against proprietary commercial models such as Sora and Veo. Both the dataset and implementation code are fully open-sourced to foster reproducible research.
📝 Abstract
The escalating quality of video generated by advanced video generation methods results in new security challenges, while there have been few relevant research efforts: 1) There is no open-source dataset for generated video detection, 2) No generated video detection method has been proposed so far. To this end, we propose an open-source dataset and a detection method for generated video for the first time. First, we propose a scalable dataset consisting of 964 prompts, covering various forgery targets, scenes, behaviors, and actions, as well as various generation models with different architectures and generation methods, including the most popular commercial models like OpenAI's Sora and Google's Veo. Second, we found via probing experiments that spatial artifact-based detectors lack generalizability. Hence, we propose a simple yet effective extbf{de}tection model based on extbf{f}rame extbf{co}nsistency ( extbf{DeCoF}), which focuses on temporal artifacts by eliminating the impact of spatial artifacts during feature learning. Extensive experiments demonstrate the efficacy of DeCoF in detecting videos generated by unseen video generation models and confirm its powerful generalizability across several commercially proprietary models. Our code and dataset will be released at https://github.com/wuwuwuyue/DeCoF.