SpecSem-Net: Integrating Spectral and Semantic Features for Robust AI-generated Video Detection

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge that existing AI-generated video detection methods, which predominantly rely on semantic features, struggle to capture subtle artifacts in high-fidelity synthetic videos. To overcome this limitation, the authors propose SpecSem-Net, a novel framework that introduces, for the first time, a semantics-guided spectral denoising mechanism. By leveraging Fourier transforms to extract high-frequency spectral features and integrating them with semantic context through gated fusion and adaptive noise suppression, the method effectively combines spectral cues and semantic information. The study also constructs a comprehensive benchmark comprising videos generated by five state-of-the-art commercial models. Experimental results demonstrate that SpecSem-Net achieves detection accuracies of 87.25% on the newly established benchmark and 95.59% on public datasets, significantly outperforming current approaches.

📝 Abstract

The remarkable visual fidelity of recent commercial video generative models, such as Sora and Veo, renders robust AI-generated video detection increasingly essential to prevent synthetic content from being indistinguishable from real videos and exploited for disinformation. However, existing detectors often fail due to an over-reliance on increasingly realistic semantic features, neglecting subtle spectral artifacts. In this paper, we propose SpecSem-Net, the first framework to introduce a semantic-guided spectral denoising mechanism specifically for high-fidelity AI-generated video detection. Specifically, we design a spectral module to extract high-frequency features via Fourier-Transform based filtering. Furthermore, to reduce misjudgments arising from spectral noise, we employ a Gated Merging Mechanism to adaptively fuse semantic context, effectively mitigating spectral noise. Additionally, to evaluate detector performance on the latest top-tier generative models, we construct a comprehensive benchmark comprising 5 SOTA commercial generators. Extensive experiments demonstrate that SpecSem-Net outperforms existing methods, achieving accuracies of 87.25% and 95.59% on our benchmark and public datasets, respectively.

Problem

Research questions and friction points this paper is trying to address.

AI-generated video detection

spectral artifacts

semantic features

deepfake detection

video forensics

Innovation

Methods, ideas, or system contributions that make the work stand out.

spectral denoising

semantic-guided fusion

AI-generated video detection

Fourier-transform features

gated merging mechanism

🔎 Similar Papers

No similar papers found.