🤖 AI Summary
To counter the illicit misuse of face images by diffusion-based deepfake generation, existing adversarial defense methods suffer from poor generalizability due to their reliance on specific model architectures and application of global perturbations. This paper proposes the first model-agnostic, region-aware proactive facial protection framework. Our method integrates gradient-guided local adversarial perturbations, sensitivity modeling of diffusion reverse sampling, and cross-architecture transferable optimization to robustly defend against arbitrary diffusion-based face-swapping models. Evaluated across seven state-of-the-art models—including SDXL, IP-Adapter, and FaceFusion—our approach achieves an average defense success rate exceeding 92%, with imperceptible perturbations that preserve original face recognition accuracy at ≥98%. The core contribution lies in overcoming architectural coupling and global-perturbation limitations, establishing a new paradigm for diffusion-era facial robustness: transferable, fine-grained, and plug-and-play.
📝 Abstract
The proliferation of diffusion-based deepfake technologies poses significant risks for unauthorized and unethical facial image manipulation. While traditional countermeasures have primarily focused on passive detection methods, this paper introduces a novel proactive defense strategy through adversarial attacks that preemptively protect facial images from being exploited by diffusion-based deepfake systems. Existing adversarial protection methods predominantly target conventional generative architectures (GANs, AEs, VAEs) and fail to address the unique challenges presented by diffusion models, which have become the predominant framework for high-quality facial deepfakes. Current diffusion-specific adversarial approaches are limited by their reliance on specific model architectures and weights, rendering them ineffective against the diverse landscape of diffusion-based deepfake implementations. Additionally, they typically employ global perturbation strategies that inadequately address the region-specific nature of facial manipulation in deepfakes.