Vulnerability-Aware Spatio-Temporal Learning for Generalizable and Interpretable Deepfake Video Detection

📅 2025-01-02

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Addressing three key challenges in deepfake video detection—complex spatiotemporal artifact coupling, weak cross-domain generalization, and lack of interpretability—this paper proposes FakeSTormer. Methodologically: (1) it introduces a vulnerability-aware multi-task dual-branch architecture that separately models spatial and temporal artifacts for fine-grained localization; (2) it devises a video-level pseudo-forgery synthesis algorithm to generate weak-artifact samples with ground-truth annotations, enhancing robustness; and (3) it integrates self-supervised pseudo-labeling with spatiotemporal decoupled attention, enabling pixel-level attribution via saliency heatmaps. Evaluated on multiple challenging benchmarks, FakeSTormer achieves state-of-the-art performance, significantly improves cross-dataset generalization, and delivers both strong discriminative capability and model interpretability.

Technology Category

Application Category

📝 Abstract

Detecting deepfake videos is highly challenging due to the complex intertwined spatial and temporal artifacts in forged sequences. Most recent approaches rely on binary classifiers trained on both real and fake data. However, such methods may struggle to focus on important artifacts, which can hinder their generalization capability. Additionally, these models often lack interpretability, making it difficult to understand how predictions are made. To address these issues, we propose FakeSTormer, offering two key contributions. First, we introduce a multi-task learning framework with additional spatial and temporal branches that enable the model to focus on subtle spatio-temporal artifacts. These branches also provide interpretability by highlighting video regions that may contain artifacts. Second, we propose a video-level data synthesis algorithm that generates pseudo-fake videos with subtle artifacts, providing the model with high-quality samples and ground truth data for our spatial and temporal branches. Extensive experiments on several challenging benchmarks demonstrate the competitiveness of our approach compared to recent state-of-the-art methods. The code is available at https://github.com/10Ring/FakeSTormer.

Problem

Research questions and friction points this paper is trying to address.

Deepfake Detection

Key Artifact Neglect

Weak Learning Capability

Innovation

Methods, ideas, or system contributions that make the work stand out.

DeepFake Detection

Anomaly Localization

Partially Fabricated Videos

🔎 Similar Papers

No similar papers found.