A Dual-Branch CNN for Robust Detection of AI-Generated Facial Forgeries

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

To address the digital trust crisis triggered by the proliferation of AI-generated facial forgery images, this paper proposes a dual-branch convolutional neural network that jointly models spatial- and frequency-domain features to enhance detection robustness. The frequency branch is innovatively designed to capture high-frequency artifacts inherently introduced by generative models, while a channel-wise attention mechanism enables adaptive fusion of multimodal features. Furthermore, we introduce the Frequency-Supervised Contrastive (FSC) unified loss function, integrating focal loss, supervised contrastive loss, and a novel frequency-center margin loss. Evaluated on the DiFF benchmark—comprising four forgery categories (text-to-image, image-to-image translation, face swapping, and attribute editing)—our method significantly outperforms existing state-of-the-art approaches, achieving detection accuracy surpassing human-level performance. The proposed framework establishes a new paradigm for AI-generated content detection, offering strong generalizability and enhanced interpretability.

Technology Category

Application Category

📝 Abstract

The rapid advancement of generative AI has enabled the creation of highly realistic forged facial images, posing significant threats to AI security, digital media integrity, and public trust. Face forgery techniques, ranging from face swapping and attribute editing to powerful diffusion-based image synthesis, are increasingly being used for malicious purposes such as misinformation, identity fraud, and defamation. This growing challenge underscores the urgent need for robust and generalizable face forgery detection methods as a critical component of AI security infrastructure. In this work, we propose a novel dual-branch convolutional neural network for face forgery detection that leverages complementary cues from both spatial and frequency domains. The RGB branch captures semantic information, while the frequency branch focuses on high-frequency artifacts that are difficult for generative models to suppress. A channel attention module is introduced to adaptively fuse these heterogeneous features, highlighting the most informative channels for forgery discrimination. To guide the network's learning process, we design a unified loss function, FSC Loss, that combines focal loss, supervised contrastive loss, and a frequency center margin loss to enhance class separability and robustness. We evaluate our model on the DiFF benchmark, which includes forged images generated from four representative methods: text-to-image, image-to-image, face swap, and face edit. Our method achieves strong performance across all categories and outperforms average human accuracy. These results demonstrate the model's effectiveness and its potential contribution to safeguarding AI ecosystems against visual forgery attacks.

Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated facial forgeries to enhance security

Combating malicious uses of face swapping and editing

Improving robustness against visual forgery attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-branch CNN combines spatial and frequency features

Channel attention module adaptively fuses heterogeneous information

Unified FSC Loss enhances class separability and robustness

🔎 Similar Papers

No similar papers found.