Fragile by Design: On the Limits of Adversarial Defenses in Personalized Generation

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing personalized generative AI methods (e.g., DreamBooth) pose significant facial identity leakage risks. While mainstream adversarial defenses (e.g., Anti-DreamBooth) aim to disrupt fine-tuning, they suffer from two critical flaws: (1) introduced perturbations cause conspicuous visual artifacts, making them easily detectable by human observers; and (2) they are highly vulnerable to non-learning-based image filtering or adversarial purification—such as JPEG compression or Gaussian blur—leading to complete defense failure. Method: We propose AntiDB_Purify, the first systematic evaluation framework for assessing defense robustness against purification attacks, featuring a multi-stage threat model encompassing both conventional filters and adversarial purification techniques. Contribution/Results: Extensive experiments demonstrate that all existing defenses fail catastrophically under purification. This work is the first to empirically expose the fundamental trade-off between invisibility and purification robustness in current facial privacy protection mechanisms, providing both empirical evidence and methodological foundations for designing novel, invisible, and purification-resilient identity-preserving paradigms.

Technology Category

Application Category

📝 Abstract

Personalized AI applications such as DreamBooth enable the generation of customized content from user images, but also raise significant privacy concerns, particularly the risk of facial identity leakage. Recent defense mechanisms like Anti-DreamBooth attempt to mitigate this risk by injecting adversarial perturbations into user photos to prevent successful personalization. However, we identify two critical yet overlooked limitations of these methods. First, the adversarial examples often exhibit perceptible artifacts such as conspicuous patterns or stripes, making them easily detectable as manipulated content. Second, the perturbations are highly fragile, as even a simple, non-learned filter can effectively remove them, thereby restoring the model's ability to memorize and reproduce user identity. To investigate this vulnerability, we propose a novel evaluation framework, AntiDB_Purify, to systematically evaluate existing defenses under realistic purification threats, including both traditional image filters and adversarial purification. Results reveal that none of the current methods maintains their protective effectiveness under such threats. These findings highlight that current defenses offer a false sense of security and underscore the urgent need for more imperceptible and robust protections to safeguard user identity in personalized generation.

Problem

Research questions and friction points this paper is trying to address.

Current defenses against facial identity leakage produce detectable artifacts

Adversarial perturbations are fragile and easily removed by simple filters

Existing protection methods fail under realistic purification attack scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial perturbations inject visible artifacts into images

Simple image filters can remove these fragile perturbations

Proposed AntiDB_Purify framework tests defense robustness systematically

🔎 Similar Papers

A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication