🤖 AI Summary
Diffusion models for medical image generation exhibit pronounced memorization of training data, posing severe privacy and ethical risks. To address this, we propose MemControl—a novel memory suppression framework leveraging bilevel optimization and parameter-efficient fine-tuning (PEFT). It is the first work to empirically quantify and mitigate memorization in medical imaging. MemControl automatically identifies a minimal tunable parameter subset and jointly optimizes a memory metric and a generation quality reward, enabling targeted, transferable memory mitigation. Using only 0.019% of model parameters for fine-tuning, MemControl significantly outperforms existing state-of-the-art methods: it reduces memorization rates substantially while preserving high-fidelity synthesis. Crucially, the learned sparse parameter subset demonstrates strong cross-domain generalizability—validated across diverse anatomical regions and imaging modalities—without requiring architectural modifications or retraining from scratch.
📝 Abstract
Diffusion models excel in generating images that closely resemble their training data but are also susceptible to data memorization, raising privacy, ethical, and legal concerns, particularly in sensitive domains such as medical imaging. We hypothesize that this memorization stems from the overparameterization of deep models and propose that regularizing model capacity during fine-tuning can mitigate this issue. Firstly, we empirically show that regulating the model capacity via Parameter-efficient fine-tuning (PEFT) mitigates memorization to some extent, however, it further requires the identification of the exact parameter subsets to be fine-tuned for high-quality generation. To identify these subsets, we introduce a bi-level optimization framework, MemControl, that automates parameter selection using memorization and generation quality metrics as rewards during fine-tuning. The parameter subsets discovered through MemControl achieve a superior tradeoff between generation quality and memorization. For the task of medical image generation, our approach outperforms existing state-of-the-art memorization mitigation strategies by fine-tuning as few as 0.019% of model parameters. Moreover, we demonstrate that the discovered parameter subsets are transferable to non-medical domains. Our framework is scalable to large datasets, agnostic to reward functions, and can be integrated with existing approaches for further memorization mitigation. To the best of our knowledge, this is the first study to empirically evaluate memorization in medical images and propose a targeted yet universal mitigation strategy. The code is available at https://github.com/Raman1121/Diffusion_Memorization_HPO.