🤖 AI Summary
Existing video forgery detection methods rely on isolated modal cues—spatial, temporal, or spectral—leading to poor generalization and parameter-heavy models. To address this, we propose a lightweight graph neural network framework that, for the first time, unifies spatial-spectral-temporal inconsistency modeling in the graph domain. Our approach constructs a structured graph representation of videos and jointly learns spectral filtering and temporal differencing operations within the graph architecture, enabling end-to-end joint inference without reliance on large pretrained models. Extensive experiments demonstrate state-of-the-art performance both in-domain and cross-domain across multiple benchmarks. Notably, our method reduces model parameters by up to 42.4× compared to prior works, significantly improving robustness against unseen manipulations and computational efficiency for real-world deployment.
📝 Abstract
The proliferation of generative video models has made detecting AI-generated and manipulated videos an urgent challenge. Existing detection approaches often fail to generalize across diverse manipulation types due to their reliance on isolated spatial, temporal, or spectral information, and typically require large models to perform well. This paper introduces SSTGNN, a lightweight Spatial-Spectral-Temporal Graph Neural Network framework that represents videos as structured graphs, enabling joint reasoning over spatial inconsistencies, temporal artifacts, and spectral distortions. SSTGNN incorporates learnable spectral filters and temporal differential modeling into a graph-based architecture, capturing subtle manipulation traces more effectively. Extensive experiments on diverse benchmark datasets demonstrate that SSTGNN not only achieves superior performance in both in-domain and cross-domain settings, but also offers strong robustness against unseen manipulations. Remarkably, SSTGNN accomplishes these results with up to 42.4$ imes$ fewer parameters than state-of-the-art models, making it highly lightweight and scalable for real-world deployment.