Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-image diffusion models pose privacy and intellectual property risks due to memorization of training data. Existing defenses rely unverifiedly on the “localization-of-memory” assumption—suppressing memorization via weight pruning—but this assumption lacks empirical validation and such methods are easily circumvented. This paper is the first to systematically demonstrate that memorization exhibits *non-locality*: a single memorized sample can be triggered by diverse text embedding positions and multiple computational pathways. Leveraging this insight, we propose a novel adversarial iterative fine-tuning paradigm that jointly performs text embedding space probing, robust trigger search, and targeted unlearning—effectively blocking memory leakage without compromising generation fidelity. Experiments across diverse benchmarks show substantial improvements in defense robustness against adaptive attacks. Our approach delivers a verifiable, deployable, and generation-preserving unlearning mechanism, advancing regulatory compliance and trustworthy deployment of generative AI.

Technology Category

Application Category

📝 Abstract
Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicate training data. Recent mitigation efforts have focused on identifying and pruning weights responsible for triggering replication, based on the assumption that memorization can be localized. Our research assesses the robustness of these pruning-based approaches. We demonstrate that even after pruning, minor adjustments to text embeddings of input prompts are sufficient to re-trigger data replication, highlighting the fragility of these defenses. Furthermore, we challenge the fundamental assumption of memorization locality, by showing that replication can be triggered from diverse locations within the text embedding space, and follows different paths in the model. Our findings indicate that existing mitigation strategies are insufficient and underscore the need for methods that truly remove memorized content, rather than attempting to suppress its retrieval. As a first step in this direction, we introduce a novel adversarial fine-tuning method that iteratively searches for replication triggers and updates the model to increase robustness. Through our research, we provide fresh insights into the nature of memorization in text-to-image DMs and a foundation for building more trustworthy and compliant generative AI.
Problem

Research questions and friction points this paper is trying to address.

Assessing robustness of pruning-based memorization mitigation in DMs
Challenging locality assumption of memorization in diffusion models
Proposing adversarial fine-tuning to remove memorized content effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial fine-tuning for robustness enhancement
Testing pruning-based defense fragility
Challenging memorization locality assumption
🔎 Similar Papers
No similar papers found.