When Deepfakes Look Real: Detecting AI-Generated Faces with Unlabeled Data due to Annotation Challenges

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The increasing photorealism of AI-generated faces undermines human annotation reliability, while existing supervised deepfake detection methods suffer severe performance degradation on unlabeled social media data due to distribution shift. Method: This paper proposes an unsupervised deepfake detection framework based on a dual-path network that jointly integrates text-guided cross-domain vision–semantics alignment, curriculum-based pseudo-label optimization, and cross-domain knowledge distillation—thereby mitigating both distribution shift and catastrophic forgetting. Crucially, learnable prompts are employed to enable robust multimodal embedding alignment. Contribution/Results: To our knowledge, this is the first work achieving robust unsupervised detection under highly overlapping real/fake face distributions. Evaluated on 11 mainstream benchmarks, our method achieves an average accuracy gain of +6.3% over state-of-the-art approaches, significantly improving unlabeled-data utilization and generalization across domains.

Technology Category

Application Category

📝 Abstract
Existing deepfake detection methods heavily depend on labeled training data. However, as AI-generated content becomes increasingly realistic, even extbf{human annotators struggle to distinguish} between deepfakes and authentic images. This makes the labeling process both time-consuming and less reliable. Specifically, there is a growing demand for approaches that can effectively utilize large-scale unlabeled data from online social networks. Unlike typical unsupervised learning tasks, where categories are distinct, AI-generated faces closely mimic real image distributions and share strong similarities, causing performance drop in conventional strategies. In this paper, we introduce the Dual-Path Guidance Network (DPGNet), to tackle two key challenges: (1) bridging the domain gap between faces from different generation models, and (2) utilizing unlabeled image samples. The method features two core modules: text-guided cross-domain alignment, which uses learnable prompts to unify visual and textual embeddings into a domain-invariant feature space, and curriculum-driven pseudo label generation, which dynamically exploit more informative unlabeled samples. To prevent catastrophic forgetting, we also facilitate bridging between domains via cross-domain knowledge distillation. Extensive experiments on extbf{11 popular datasets}, show that DPGNet outperforms SoTA approaches by extbf{6.3%}, highlighting its effectiveness in leveraging unlabeled data to address the annotation challenges posed by the increasing realism of deepfakes.
Problem

Research questions and friction points this paper is trying to address.

Detect AI-generated faces without labeled training data
Bridge domain gap between different generation models
Utilize unlabeled data to improve detection accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Path Guidance Network for deepfake detection
Text-guided cross-domain alignment for invariant features
Curriculum-driven pseudo label generation for unlabeled data
🔎 Similar Papers
No similar papers found.
Zhiqiang Yang
Zhiqiang Yang
ZJUT
R
Renshuai Tao
Institute of Information Science, Beijing Jiaotong University
X
Xiaolong Zheng
Institute of Automation, Chinese Academy of Sciences
G
Guodong Yang
Institute of Automation, Chinese Academy of Sciences
Chunjie Zhang
Chunjie Zhang
Beijing Jiaotong University
multimediacomputer vision