🤖 AI Summary
To address insufficient temporal artifact exploitation in AI-generated video detection, this paper introduces, for the first time, Newtonian second-order dynamics modeling into video forgery detection, proposing D3—a training-free detection method. D3 computes inter-frame motion acceleration features via second-order central differencing, thereby characterizing systematic distributional discrepancies in acceleration between authentic and synthetic videos, enabling zero-shot, plug-and-play detection. Extensive experiments across four public benchmarks (40 subsets total) demonstrate that D3 achieves a 10.39% average AP gain over state-of-the-art methods on GenVideo, while exhibiting minimal computational overhead and strong robustness. The core contribution is the establishment of the first second-order dynamical systems framework tailored to video forgery detection, alongside the discovery of a universal acceleration-domain deficiency inherent to AI-generated videos.
📝 Abstract
The evolution of video generation techniques, such as Sora, has made it increasingly easy to produce high-fidelity AI-generated videos, raising public concern over the dissemination of synthetic content. However, existing detection methodologies remain limited by their insufficient exploration of temporal artifacts in synthetic videos. To bridge this gap, we establish a theoretical framework through second-order dynamical analysis under Newtonian mechanics, subsequently extending the Second-order Central Difference features tailored for temporal artifact detection. Building on this theoretical foundation, we reveal a fundamental divergence in second-order feature distributions between real and AI-generated videos. Concretely, we propose Detection by Difference of Differences (D3), a novel training-free detection method that leverages the above second-order temporal discrepancies. We validate the superiority of our D3 on 4 open-source datasets (Gen-Video, VideoPhy, EvalCrafter, VidProM), 40 subsets in total. For example, on GenVideo, D3 outperforms the previous best method by 10.39% (absolute) mean Average Precision. Additional experiments on time cost and post-processing operations demonstrate D3's exceptional computational efficiency and strong robust performance. Our code is available at https://github.com/Zig-HS/D3.