When Deepfake Detection Meets Graph Neural Network:a Unified and Lightweight Learning Framework

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing video forgery detection methods rely on isolated modal cues—spatial, temporal, or spectral—leading to poor generalization and parameter-heavy models. To address this, we propose a lightweight graph neural network framework that, for the first time, unifies spatial-spectral-temporal inconsistency modeling in the graph domain. Our approach constructs a structured graph representation of videos and jointly learns spectral filtering and temporal differencing operations within the graph architecture, enabling end-to-end joint inference without reliance on large pretrained models. Extensive experiments demonstrate state-of-the-art performance both in-domain and cross-domain across multiple benchmarks. Notably, our method reduces model parameters by up to 42.4× compared to prior works, significantly improving robustness against unseen manipulations and computational efficiency for real-world deployment.

Technology Category

Application Category

📝 Abstract

The proliferation of generative video models has made detecting AI-generated and manipulated videos an urgent challenge. Existing detection approaches often fail to generalize across diverse manipulation types due to their reliance on isolated spatial, temporal, or spectral information, and typically require large models to perform well. This paper introduces SSTGNN, a lightweight Spatial-Spectral-Temporal Graph Neural Network framework that represents videos as structured graphs, enabling joint reasoning over spatial inconsistencies, temporal artifacts, and spectral distortions. SSTGNN incorporates learnable spectral filters and temporal differential modeling into a graph-based architecture, capturing subtle manipulation traces more effectively. Extensive experiments on diverse benchmark datasets demonstrate that SSTGNN not only achieves superior performance in both in-domain and cross-domain settings, but also offers strong robustness against unseen manipulations. Remarkably, SSTGNN accomplishes these results with up to 42.4$ imes$ fewer parameters than state-of-the-art models, making it highly lightweight and scalable for real-world deployment.

Problem

Research questions and friction points this paper is trying to address.

Detect AI-generated videos across diverse manipulation types

Overcome limitations of isolated spatial, temporal, or spectral analysis

Reduce model size while maintaining detection performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight Spatial-Spectral-temporal Graph Neural Network

Joint reasoning over spatial-temporal-spectral inconsistencies

Learnable spectral filters and temporal differential modeling

🔎 Similar Papers

LatentForensics: Towards frugal deepfake detection in the StyleGAN latent space

2023-03-30Citations: 0