Improving Generalization in Deepfake Detection with Face Foundation Models and Metric Learning

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deepfake detection models suffer from poor generalization across domains and in real-world scenarios. To address this, we propose a novel detection framework integrating a foundation face model with metric learning. Specifically, we leverage a self-supervised pre-trained face model (FSFM) to extract robust facial representations, followed by joint fine-tuning on multi-source deepfake datasets. We introduce a novel variant of triplet loss that explicitly enhances separability between authentic and manipulated samples in the embedding space. Additionally, we incorporate dual attribution supervision—based on both manipulation type and data source—to improve out-of-distribution generalization. Extensive experiments across diverse benchmarks, including real-world video collections, demonstrate that our method significantly outperforms existing state-of-the-art approaches, achieving substantial gains in both cross-domain robustness and detection accuracy.

Technology Category

Application Category

📝 Abstract
The increasing realism and accessibility of deepfakes have raised critical concerns about media authenticity and information integrity. Despite recent advances, deepfake detection models often struggle to generalize beyond their training distributions, particularly when applied to media content found in the wild. In this work, we present a robust video deepfake detection framework with strong generalization that takes advantage of the rich facial representations learned by face foundation models. Our method is built on top of FSFM, a self-supervised model trained on real face data, and is further fine-tuned using an ensemble of deepfake datasets spanning both face-swapping and face-reenactment manipulations. To enhance discriminative power, we incorporate triplet loss variants during training, guiding the model to produce more separable embeddings between real and fake samples. Additionally, we explore attribution-based supervision schemes, where deepfakes are categorized by manipulation type or source dataset, to assess their impact on generalization. Extensive experiments across diverse evaluation benchmarks demonstrate the effectiveness of our approach, especially in challenging real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Detecting deepfakes that generalize beyond training data
Leveraging face foundation models for improved facial representation
Enhancing discriminative power with metric learning techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages face foundation models for facial representations
Uses triplet loss variants for enhanced discriminative power
Incorporates attribution-based supervision by manipulation type
🔎 Similar Papers
No similar papers found.
S
Stelios Mylonas
Centre for Research and Technology Hellas, Information Technologies Institute, Thessaloniki, Greece
Symeon Papadopoulos
Symeon Papadopoulos
Information Technologies Institute (ITI)
Artificial IntelligenceMedia VerificationAI FairnessWeb MiningMultimedia Retrieval