🤖 AI Summary
Diffusion models exhibit memorization of training data during image generation, posing privacy and copyright risks. Existing inference-time mitigation strategies—such as classifier-free guidance (CFG) or prompt embedding perturbation—often degrade prompt alignment or visual quality. This paper proposes a novel, training-free, inference-time framework that suppresses localized memorization without model fine-tuning. Our approach uniquely combines frequency-domain noise initialization with semantically directed feature injection. It comprises four key components: (1) frequency-domain noise initialization, (2) identification of critical denoising timesteps, (3) segmentation of memorized regions, and (4) cross-image semantic feature transfer and injection. Experiments demonstrate that our method significantly reduces memorization rates—outperforming CFG baselines—while preserving high prompt fidelity and visual quality.
📝 Abstract
Diffusion models can unintentionally reproduce training examples, raising privacy and copyright concerns as these systems are increasingly deployed at scale. Existing inference-time mitigation methods typically manipulate classifier-free guidance (CFG) or perturb prompt embeddings; however, they often struggle to reduce memorization without compromising alignment with the conditioning prompt. We introduce CAPTAIN, a training-free framework that mitigates memorization by directly modifying latent features during denoising. CAPTAIN first applies frequency-based noise initialization to reduce the tendency to replicate memorized patterns early in the denoising process. It then identifies the optimal denoising timesteps for feature injection and localizes memorized regions. Finally, CAPTAIN injects semantically aligned features from non-memorized reference images into localized latent regions, suppressing memorization while preserving prompt fidelity and visual quality. Our experiments show that CAPTAIN achieves substantial reductions in memorization compared to CFG-based baselines while maintaining strong alignment with the intended prompt.