Reduced Spatial Dependency for More General Video-level Deepfake Detection

📅 2025-03-05

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Existing CNN-based deepfake detection methods suffer from poor cross-domain generalization due to over-reliance on spatial features, which undermines temporal modeling capability. To address this, we propose the Spatial Dependency Reduction Framework (SDRF), featuring three key innovations: (1) a Spatial Perturbation Branch (SPB) that explicitly disentangles spatial bias; (2) a mutual information-driven Temporal Relational Feature Integration (TRFI) module to strengthen intrinsic temporal consistency modeling; and (3) integration of a temporal Transformer with multi-cluster perturbed feature aggregation to enhance video-level discriminative robustness. Evaluated across multiple benchmarks, SDRF consistently outperforms state-of-the-art methods, achieving an average 3.2% improvement in cross-dataset detection accuracy. Ablation studies confirm that both SPB and TRFI are critical to the framework’s enhanced generalization performance.

Technology Category

Application Category

📝 Abstract

As one of the prominent AI-generated content, Deepfake has raised significant safety concerns. Although it has been demonstrated that temporal consistency cues offer better generalization capability, existing methods based on CNNs inevitably introduce spatial bias, which hinders the extraction of intrinsic temporal features. To address this issue, we propose a novel method called Spatial Dependency Reduction (SDR), which integrates common temporal consistency features from multiple spatially-perturbed clusters, to reduce the dependency of the model on spatial information. Specifically, we design multiple Spatial Perturbation Branch (SPB) to construct spatially-perturbed feature clusters. Subsequently, we utilize the theory of mutual information and propose a Task-Relevant Feature Integration (TRFI) module to capture temporal features residing in similar latent space from these clusters. Finally, the integrated feature is fed into a temporal transformer to capture long-range dependencies. Extensive benchmarks and ablation studies demonstrate the effectiveness and rationale of our approach.

Problem

Research questions and friction points this paper is trying to address.

Reduces spatial bias in deepfake detection models

Enhances extraction of intrinsic temporal features

Improves generalization using temporal consistency cues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial Dependency Reduction (SDR) method

Spatial Perturbation Branch (SPB) clusters

Task-Relevant Feature Integration (TRFI) module

🔎 Similar Papers

No similar papers found.