D3: Training-Free AI-Generated Video Detection Using Second-Order Features

📅 2025-08-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient temporal artifact exploitation in AI-generated video detection, this paper introduces, for the first time, Newtonian second-order dynamics modeling into video forgery detection, proposing D3—a training-free detection method. D3 computes inter-frame motion acceleration features via second-order central differencing, thereby characterizing systematic distributional discrepancies in acceleration between authentic and synthetic videos, enabling zero-shot, plug-and-play detection. Extensive experiments across four public benchmarks (40 subsets total) demonstrate that D3 achieves a 10.39% average AP gain over state-of-the-art methods on GenVideo, while exhibiting minimal computational overhead and strong robustness. The core contribution is the establishment of the first second-order dynamical systems framework tailored to video forgery detection, alongside the discovery of a universal acceleration-domain deficiency inherent to AI-generated videos.

Technology Category

Application Category

📝 Abstract
The evolution of video generation techniques, such as Sora, has made it increasingly easy to produce high-fidelity AI-generated videos, raising public concern over the dissemination of synthetic content. However, existing detection methodologies remain limited by their insufficient exploration of temporal artifacts in synthetic videos. To bridge this gap, we establish a theoretical framework through second-order dynamical analysis under Newtonian mechanics, subsequently extending the Second-order Central Difference features tailored for temporal artifact detection. Building on this theoretical foundation, we reveal a fundamental divergence in second-order feature distributions between real and AI-generated videos. Concretely, we propose Detection by Difference of Differences (D3), a novel training-free detection method that leverages the above second-order temporal discrepancies. We validate the superiority of our D3 on 4 open-source datasets (Gen-Video, VideoPhy, EvalCrafter, VidProM), 40 subsets in total. For example, on GenVideo, D3 outperforms the previous best method by 10.39% (absolute) mean Average Precision. Additional experiments on time cost and post-processing operations demonstrate D3's exceptional computational efficiency and strong robust performance. Our code is available at https://github.com/Zig-HS/D3.
Problem

Research questions and friction points this paper is trying to address.

Detect AI-generated videos without training
Identify temporal artifacts in synthetic videos
Leverage second-order feature distribution differences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses second-order dynamical analysis framework
Leverages second-order temporal feature discrepancies
Training-free method for video detection
🔎 Similar Papers
No similar papers found.
C
Chende Zheng
Xi’an Jiaotong University
R
Ruiqi Suo
Xi’an Jiaotong University
C
Chenhao Lin
Guangdong OPPO Mobile Communications Co., Ltd.
Zhengyu Zhao
Zhengyu Zhao
Xi'an Jiaotong University, China
Adversarial Machine LearningComputer Vision
L
Le Yang
Xi’an Jiaotong University
S
Shuai Liu
Xi’an Jiaotong University
Minghui Yang
Minghui Yang
Ant Group
NLPDialogueGraph3DV
C
Cong Wang
City University of Hong Kong
C
Chao Shen
Xi’an Jiaotong University