Deepfake Detection that Generalizes Across Benchmarks

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Deepfake detectors exhibit poor generalization to unseen manipulation techniques, hindering real-world deployment. To address this, we propose a lightweight and highly robust cross-dataset detection framework: leveraging a pre-trained CLIP vision encoder, we fine-tune only its Layer Normalization parameters—constituting merely 0.03% of total parameters—and incorporate L2 normalization and latent-space augmentation to shape a compact hyperspherical feature manifold. The model is trained under paired real-fake video supervision, significantly enhancing generalization to previously unseen forgery methods. Evaluated across 13 benchmark datasets in cross-domain AUROC settings, our method achieves state-of-the-art performance. This demonstrates that strong generalization can be attained with minimal parameter overhead, establishing a new paradigm for efficient, deployable deepfake detection.

Technology Category

Application Category

📝 Abstract

The generalization of deepfake detectors to unseen manipulation techniques remains a challenge for practical deployment. Although many approaches adapt foundation models by introducing significant architectural complexity, this work demonstrates that robust generalization is achievable through a parameter-efficient adaptation of a pre-trained CLIP vision encoder. The proposed method, LNCLIP-DF, fine-tunes only the Layer Normalization parameters (0.03% of the total) and enhances generalization by enforcing a hyperspherical feature manifold using L2 normalization and latent space augmentations. We conducted an extensive evaluation on 13 benchmark datasets spanning from 2019 to 2025. The proposed method achieves state-of-the-art performance, outperforming more complex, recent approaches in average cross-dataset AUROC. Our analysis yields two primary findings for the field: 1) training on paired real-fake data from the same source video is essential for mitigating shortcut learning and improving generalization, and 2) detection difficulty on academic datasets has not strictly increased over time, with models trained on older, diverse datasets showing strong generalization capabilities. This work delivers a computationally efficient and reproducible method, proving that state-of-the-art generalization is attainable by making targeted, minimal changes to a pre-trained CLIP model. The code will be made publicly available upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Generalizing deepfake detection across unseen manipulation techniques

Achieving robust generalization with minimal parameter adaptation

Enhancing detection performance using hyperspherical feature manifold

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-efficient CLIP adaptation via Layer Normalization tuning

Hyperspherical feature manifold with L2 normalization

Latent space augmentations enhance cross-dataset generalization

🔎 Similar Papers

No similar papers found.