Reduced Spatial Dependency for More General Video-level Deepfake Detection

πŸ“… 2025-03-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing CNN-based deepfake detection methods suffer from poor cross-domain generalization due to over-reliance on spatial features, which undermines temporal modeling capability. To address this, we propose the Spatial Dependency Reduction Framework (SDRF), featuring three key innovations: (1) a Spatial Perturbation Branch (SPB) that explicitly disentangles spatial bias; (2) a mutual information-driven Temporal Relational Feature Integration (TRFI) module to strengthen intrinsic temporal consistency modeling; and (3) integration of a temporal Transformer with multi-cluster perturbed feature aggregation to enhance video-level discriminative robustness. Evaluated across multiple benchmarks, SDRF consistently outperforms state-of-the-art methods, achieving an average 3.2% improvement in cross-dataset detection accuracy. Ablation studies confirm that both SPB and TRFI are critical to the framework’s enhanced generalization performance.

Technology Category

Application Category

πŸ“ Abstract
As one of the prominent AI-generated content, Deepfake has raised significant safety concerns. Although it has been demonstrated that temporal consistency cues offer better generalization capability, existing methods based on CNNs inevitably introduce spatial bias, which hinders the extraction of intrinsic temporal features. To address this issue, we propose a novel method called Spatial Dependency Reduction (SDR), which integrates common temporal consistency features from multiple spatially-perturbed clusters, to reduce the dependency of the model on spatial information. Specifically, we design multiple Spatial Perturbation Branch (SPB) to construct spatially-perturbed feature clusters. Subsequently, we utilize the theory of mutual information and propose a Task-Relevant Feature Integration (TRFI) module to capture temporal features residing in similar latent space from these clusters. Finally, the integrated feature is fed into a temporal transformer to capture long-range dependencies. Extensive benchmarks and ablation studies demonstrate the effectiveness and rationale of our approach.
Problem

Research questions and friction points this paper is trying to address.

Reduces spatial bias in deepfake detection models
Enhances extraction of intrinsic temporal features
Improves generalization using temporal consistency cues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial Dependency Reduction (SDR) method
Spatial Perturbation Branch (SPB) clusters
Task-Relevant Feature Integration (TRFI) module
πŸ”Ž Similar Papers
No similar papers found.
Beilin Chu
Beilin Chu
Beijing University of Posts and Telecommunications
AIMulti-model learningAIGC detection
X
Xuan Xu
School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
Y
Yufei Zhang
School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
W
Weike You
School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
L
Linna Zhou
School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China