🤖 AI Summary
The escalating misuse of deepfakes—particularly diffusion models (DMs)—poses severe security threats, yet existing passive detection methods and GAN-specific active defenses lack generalization across both DMs and GANs. Method: We propose the first unified active defense framework for face images targeting both DMs and GANs, featuring three innovations: (i) a novel attention-mechanism interference strategy tailored to diffusion models; (ii) a cross-architecture (DM+GAN) transferable feature extractor to counter manipulation; and (iii) joint optimization of feature-level perturbations and signal-domain low-pass filtering, balancing imperceptibility, JPEG robustness (QF=50), and generalization. Results: Our method achieves state-of-the-art performance on CelebA-HQ and VGGFace2-HQ: <3.2% success rate against mainstream DM-based attacks and >87% cross-model defense efficacy against unseen GANs.
📝 Abstract
The rising use of deepfakes in criminal activities presents a significant issue, inciting widespread controversy. While numerous studies have tackled this problem, most primarily focus on deepfake detection. These reactive solutions are insufficient as a fundamental approach for crimes where authenticity is disregarded. Existing proactive defenses also have limitations, as they are effective only for deepfake models based on specific Generative Adversarial Networks (GANs), making them less applicable in light of recent advancements in diffusion-based models. In this paper, we propose a proactive defense method named FaceShield, which introduces novel defense strategies targeting deepfakes generated by Diffusion Models (DMs) and facilitates defenses on various existing GAN-based deepfake models through facial feature extractor manipulations. Our approach consists of three main components: (i) manipulating the attention mechanism of DMs to exclude protected facial features during the denoising process, (ii) targeting prominent facial feature extraction models to enhance the robustness of our adversarial perturbation, and (iii) employing Gaussian blur and low-pass filtering techniques to improve imperceptibility while enhancing robustness against JPEG compression. Experimental results on the CelebA-HQ and VGGFace2-HQ datasets demonstrate that our method achieves state-of-the-art performance against the latest deepfake models based on DMs, while also exhibiting transferability to GANs and showcasing greater imperceptibility of noise along with enhanced robustness.