🤖 AI Summary
Existing deepfake detectors rely heavily on domain-specific forensic traces, exhibiting poor cross-domain generalization and limited robustness against unseen manipulation techniques. To address this, we propose a robust detection framework that jointly models local and global forgery characteristics. Our approach introduces a local patch-guided mechanism for fine-grained anomaly localization and incorporates global forgery diversity modeling to enhance adaptability across manipulation types and datasets. Built upon the CLIP-ViT architecture, it integrates spatiotemporal artifact modeling, patch-level supervision, domain-aware feature enhancement, and boundary-expanded feature generation to enable multi-scale forgery analysis. Extensive evaluation demonstrates that our method significantly outperforms state-of-the-art approaches in cross-dataset and cross-manipulation benchmarks. Notably, it achieves substantial accuracy gains under zero-shot forgery detection scenarios—where training data excludes the target manipulation type—thereby markedly improving generalization to unseen attacks and overall detection robustness.
📝 Abstract
Recent advances in deep generative models have made it easier to manipulate face videos, raising significant concerns about their potential misuse for fraud and misinformation. Existing detectors often perform well in in-domain scenarios but fail to generalize across diverse manipulation techniques due to their reliance on forgery-specific artifacts. In this work, we introduce DeepShield, a novel deepfake detection framework that balances local sensitivity and global generalization to improve robustness across unseen forgeries. DeepShield enhances the CLIP-ViT encoder through two key components: Local Patch Guidance (LPG) and Global Forgery Diversification (GFD). LPG applies spatiotemporal artifact modeling and patch-wise supervision to capture fine-grained inconsistencies often overlooked by global models. GFD introduces domain feature augmentation, leveraging domain-bridging and boundary-expanding feature generation to synthesize diverse forgeries, mitigating overfitting and enhancing cross-domain adaptability. Through the integration of novel local and global analysis for deepfake detection, DeepShield outperforms state-of-the-art methods in cross-dataset and cross-manipulation evaluations, achieving superior robustness against unseen deepfake attacks.