π€ AI Summary
Existing CNN-based deepfake detection methods suffer from poor cross-domain generalization due to over-reliance on spatial features, which undermines temporal modeling capability. To address this, we propose the Spatial Dependency Reduction Framework (SDRF), featuring three key innovations: (1) a Spatial Perturbation Branch (SPB) that explicitly disentangles spatial bias; (2) a mutual information-driven Temporal Relational Feature Integration (TRFI) module to strengthen intrinsic temporal consistency modeling; and (3) integration of a temporal Transformer with multi-cluster perturbed feature aggregation to enhance video-level discriminative robustness. Evaluated across multiple benchmarks, SDRF consistently outperforms state-of-the-art methods, achieving an average 3.2% improvement in cross-dataset detection accuracy. Ablation studies confirm that both SPB and TRFI are critical to the frameworkβs enhanced generalization performance.
π Abstract
As one of the prominent AI-generated content, Deepfake has raised significant safety concerns. Although it has been demonstrated that temporal consistency cues offer better generalization capability, existing methods based on CNNs inevitably introduce spatial bias, which hinders the extraction of intrinsic temporal features. To address this issue, we propose a novel method called Spatial Dependency Reduction (SDR), which integrates common temporal consistency features from multiple spatially-perturbed clusters, to reduce the dependency of the model on spatial information. Specifically, we design multiple Spatial Perturbation Branch (SPB) to construct spatially-perturbed feature clusters. Subsequently, we utilize the theory of mutual information and propose a Task-Relevant Feature Integration (TRFI) module to capture temporal features residing in similar latent space from these clusters. Finally, the integrated feature is fed into a temporal transformer to capture long-range dependencies. Extensive benchmarks and ablation studies demonstrate the effectiveness and rationale of our approach.