🤖 AI Summary
This work identifies severe training data memorization in text-to-image diffusion models when synthesizing chest X-rays from the MIMIC-CXR dataset—particularly reproducing de-identification artifacts in text prompts (e.g., “no evidence of”, “unremarkable”), newly identified as the strongest memorization cues. Method: We propose a unified framework integrating memory attribution analysis, prompt-level sensitivity evaluation, and token-level memory quantification, complemented by adversarial ablation experiments. Contribution/Results: Existing inference-time mitigation strategies show limited efficacy (<20% reduction) in suppressing memorization; residual de-identification markers significantly exacerbate privacy leakage. We establish the first benchmark of memorized prompts for chest radiograph synthesis and introduce privacy-enhancing practices tailored to medical imaging. Our work provides a reproducible evaluation paradigm and actionable interventions to support compliant, trustworthy synthetic data deployment in healthcare.
📝 Abstract
Generative models, particularly text-to-image (T2I) diffusion models, play a crucial role in medical image analysis. However, these models are prone to training data memorization, posing significant risks to patient privacy. Synthetic chest X-ray generation is one of the most common applications in medical image analysis with the MIMIC-CXR dataset serving as the primary data repository for this task. This study adopts a data-driven approach and presents the first systematic attempt to identify prompts and text tokens in MIMIC-CXR that contribute the most to training data memorization. Our analysis reveals an unexpected finding: prompts containing traces of de-identification procedures are among the most memorized, with de-identification markers contributing the most. Furthermore, we also find existing inference-time memorization mitigation strategies are ineffective and fail to sufficiently reduce the model's reliance on memorized text tokens highlighting a broader issue in T2I synthesis with MIMIC-CXR. On this front, we propose actionable strategies to enhance privacy and improve the reliability of generative models in medical imaging. Finally, our results provide a foundation for future work on developing and benchmarking memorization mitigation techniques for synthetic chest X-ray generation using the MIMIC-CXR dataset.