DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection

📅 2025-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the weak generalization and poor robustness of deepfake face detection in unconstrained, real-world scenarios, this paper proposes an attention-driven supervised contrastive learning framework. The method jointly integrates three heterogeneous backbone architectures—MaxViT (featuring stride-aware attention), CoAtNet (a convolution-attention hybrid), and EVA-02 (pretrained via masked image modeling)—to collaboratively capture local forensic details, multi-scale structural artifacts, and global semantic inconsistencies induced by forgery. A frozen-backbone fine-tuning strategy with independent classification heads is adopted, followed by majority-voting ensemble to enhance prediction stability. Evaluated on the DFWild-Cup benchmark, the framework achieves 95.83% accuracy, demonstrating substantial improvements in cross-dataset generalization and robustness under realistic, in-the-wild conditions.

Technology Category

Application Category

📝 Abstract
This report presents our approach for the IEEE SP Cup 2025: Deepfake Face Detection in the Wild (DFWild-Cup), focusing on detecting deepfakes across diverse datasets. Our methodology employs advanced backbone models, including MaxViT, CoAtNet, and EVA-02, fine-tuned using supervised contrastive loss to enhance feature separation. These models were specifically chosen for their complementary strengths. Integration of convolution layers and strided attention in MaxViT is well-suited for detecting local features. In contrast, hybrid use of convolution and attention mechanisms in CoAtNet effectively captures multi-scale features. Robust pretraining with masked image modeling of EVA-02 excels at capturing global features. After training, we freeze the parameters of these models and train the classification heads. Finally, a majority voting ensemble is employed to combine the predictions from these models, improving robustness and generalization to unseen scenarios. The proposed system addresses the challenges of detecting deepfakes in real-world conditions and achieves a commendable accuracy of 95.83% on the validation dataset.
Problem

Research questions and friction points this paper is trying to address.

Deepfake Detection
Accuracy Improvement
Wild Environment Recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Mechanism
Supervised Contrastive Learning
Ensemble Model Voting
🔎 Similar Papers
No similar papers found.