🤖 AI Summary
Existing deepfake detection methods suffer from severe generalization failure against unseen forgery patterns, struggling to address increasingly realistic cross-domain and cross-manipulation threats. To address this, we propose a lightweight generalizable detection framework comprising three key components: (1) a deepfake-specific representation learning mechanism that explicitly models forensic traces; (2) a parameter-free feature-space redistribution strategy coupled with classification-agnostic feature enhancement to improve cross-domain robustness; and (3) synergistic integration of large-vision-model transfer, local patch discontinuity modeling, and parameter-free data augmentation. With only 0.28M trainable parameters, our method achieves state-of-the-art generalization performance under both cross-domain and cross-manipulation evaluation protocols, significantly mitigating the sharp performance degradation commonly observed in unknown forgery scenarios.
📝 Abstract
The rapid advancement of generative artificial intelligence has enabled the creation of highly realistic fake facial images, posing serious threats to personal privacy and the integrity of online information. Existing deepfake detection methods often rely on handcrafted forensic cues and complex architectures, achieving strong performance in intra-domain settings but suffering significant degradation when confronted with unseen forgery patterns. In this paper, we propose GenDF, a simple yet effective framework that transfers a powerful large-scale vision model to the deepfake detection task with a compact and neat network design. GenDF incorporates deepfake-specific representation learning to capture discriminative patterns between real and fake facial images, feature space redistribution to mitigate distribution mismatch, and a classification-invariant feature augmentation strategy to enhance generalization without introducing additional trainable parameters. Extensive experiments demonstrate that GenDF achieves state-of-the-art generalization performance in cross-domain and cross-manipulation settings while requiring only 0.28M trainable parameters, validating the effectiveness and efficiency of the proposed framework.