🤖 AI Summary
This work addresses the tendency of text-to-image diffusion models to memorize training data, which compromises generalization and raises safety concerns. Existing mitigation strategies often degrade image quality or prompt alignment. To overcome this limitation, the authors model the diffusion denoising process as a dynamical system and introduce a novel framework that integrates reachability analysis with constrained reinforcement learning. Reachability analysis identifies intermediate states likely to evolve into memorized samples, while constrained reinforcement learning applies minimal perturbations in the caption embedding space to steer the generation trajectory away from these memory-prone regions. The approach operates without modifying the backbone model, enabling plug-and-play deployment. It achieves state-of-the-art performance by simultaneously preserving high image fidelity (FID), strong prompt alignment (CLIP score), and significantly enhanced output diversity (SSCD), thereby dominating the Pareto frontier compared to existing methods.
📝 Abstract
Text-to-image diffusion models often memorize training data, revealing a fundamental failure to generalize beyond the training set. Current mitigation strategies typically sacrifice image quality or prompt alignment to reduce memorization. To address this, we propose Reachability-Aware Diffusion Steering (RADS), an inference-time framework that prevents memorization while preserving generation fidelity. RADS models the diffusion denoising process as a dynamical system and applies concepts from reachability analysis to approximate the "backward reachable tube"--the set of intermediate states that inevitably evolve into memorized samples. We then formulate mitigation as a constrained reinforcement learning (RL) problem, where a policy learns to steer the trajectory away from memorization via minimal perturbations in the caption embedding space. Empirical evaluations show that RADS achieves a superior Pareto frontier between generation diversity (SSCD), quality (FID), and alignment (CLIP) compared to state-of-the-art baselines. Crucially, RADS provides robust mitigation without modifying the diffusion backbone, offering a plug-and-play solution for safe generation. Our website is available at: https://s-karnik.github.io/rads-memorization-project-page/.