Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Third-party dataset poisoning poses a critical security threat to Deepfake detectors by enabling stealthy backdoor implantation. Method: We propose a novel backdoor attack framework featuring high concealment and semantic controllability. It introduces the first adaptive trigger generation mechanism integrating semantic suppression with cryptographic control, supports both dirty-label and clean-label poisoning paradigms, and combines adversarial trigger synthesis with steganographic pattern embedding. Results: Evaluated on mainstream Deepfake detectors, our approach achieves >98% backdoor activation rate; triggers are imperceptible to human vision and resistant to forensic traceback. This work is the first to systematically demonstrate the feasibility and severity of semantic-level backdoor attacks in the Deepfake data supply chain, providing foundational theoretical insights and practical technical warnings for security evaluation and robustness enhancement of Deepfake defense systems.

Technology Category

Application Category

📝 Abstract

With the advancement of AI generative techniques, Deepfake faces have become incredibly realistic and nearly indistinguishable to the human eye. To counter this, Deepfake detectors have been developed as reliable tools for assessing face authenticity. These detectors are typically developed on Deep Neural Networks (DNNs) and trained using third-party datasets. However, this protocol raises a new security risk that can seriously undermine the trustfulness of Deepfake detectors: Once the third-party data providers insert poisoned (corrupted) data maliciously, Deepfake detectors trained on these datasets will be injected ``backdoors'' that cause abnormal behavior when presented with samples containing specific triggers. This is a practical concern, as third-party providers may distribute or sell these triggers to malicious users, allowing them to manipulate detector performance and escape accountability. This paper investigates this risk in depth and describes a solution to stealthily infect Deepfake detectors. Specifically, we develop a trigger generator, that can synthesize passcode-controlled, semantic-suppression, adaptive, and invisible trigger patterns, ensuring both the stealthiness and effectiveness of these triggers. Then we discuss two poisoning scenarios, dirty-label poisoning and clean-label poisoning, to accomplish the injection of backdoors. Extensive experiments demonstrate the effectiveness, stealthiness, and practicality of our method compared to several baselines.

Problem

Research questions and friction points this paper is trying to address.

Deepfake detectors vulnerable to poisoned data attacks

Third-party datasets can inject backdoors into detectors

Proposing stealthy trigger patterns to manipulate detector performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Develops passcode-controlled invisible trigger generator

Explores dirty-label and clean-label poisoning scenarios

Ensures stealthiness and effectiveness of backdoor triggers

🔎 Similar Papers

No similar papers found.