🤖 AI Summary
To address the misuse of personalized diffusion models for illicit identity impersonation, this paper proposes RID, a real-time identity protection framework tailored for edge devices (e.g., smartphones). RID introduces a novel paradigm of generating adversarial perturbations via a single forward pass of a lightweight neural network—eliminating iterative optimization and enabling millisecond-scale defense latency. It establishes a black-box robust defense architecture compatible with ensembles of diverse diffusion models, ensuring resilience against image compression, purification attacks, and computational constraints. Through hardware-aware model compression and on-device optimization, RID achieves only 0.12 seconds of inference latency on an A100 GPU—4,400× faster than baseline methods. Quantitatively, it outperforms existing approaches in both visual fidelity (e.g., LPIPS, SSIM) and identity confusion metrics (e.g., ID-similarity reduction), while supporting cross-platform real-time deployment.
📝 Abstract
Personalized generative diffusion models, capable of synthesizing highly realistic images based on a few reference portraits, may pose substantial social, ethical, and legal risks via identity replication. Existing defense mechanisms rely on computationally intensive adversarial perturbations tailored to individual images, rendering them impractical for real-world deployment. This study introduces the Real-time Identity Defender (RID), a neural network designed to generate adversarial perturbations through a single forward pass, bypassing the need for image-specific optimization. RID achieves unprecedented efficiency, with defense times as low as 0.12 seconds on a single NVIDIA A100 80G GPU (4,400 times faster than leading methods) and 1.1 seconds per image on a standard Intel i9 CPU, making it suitable for edge devices such as smartphones. Despite its efficiency, RID achieves promising protection performance across visual and quantitative benchmarks, effectively mitigating identity replication risks. Our analysis reveals that RID's perturbations mimic the efficacy of traditional defenses while exhibiting properties distinct from natural noise, such as Gaussian perturbations. To enhance robustness, we extend RID into an ensemble framework that integrates multiple pre-trained text-to-image diffusion models, ensuring resilience against black-box attacks and post-processing techniques, including image compression and purification. Our model is envisioned to play a crucial role in safeguarding portrait rights, thereby preventing illegal and unethical uses.