🤖 AI Summary
This work proposes an end-to-end deep learning framework for medical image de-identification that preserves anatomical integrity and downstream task performance. Unlike conventional methods that often degrade diagnostic utility by disrupting anatomical structures, the proposed approach integrates a lightweight CRNN for detecting and masking sensitive textual regions with a latent diffusion inpainting model based on Stable Diffusion 2 to reconstruct semantically coherent and anatomically plausible images. By unifying anonymization and high-fidelity restoration within a single workflow, the method effectively mitigates re-identification risks while maintaining strong visual consistency and analytical utility for subsequent model-based assessments.
📝 Abstract
Removing patient-specific information from medical images is crucial to enable sharing and open science without compromising patient identities. However, many methods currently used for deidentification have negative effects on downstream image analysis tasks because of removal of relevant but non-identifiable information. This work presents an end-to-end deep learning framework for transforming raw clinical image volumes into de-identified, analysis-ready datasets without compromising downstream utility. The methodology developed and tested in this work first detects and redacts regions likely to contain protected health information (PHI), such as burned-in text and metadata, and then uses a generative deep learning model to inpaint the redacted areas with anatomically and imaging plausible content. The proposed pipeline leverages a lightweight hybrid architecture, combining CRNN-based redaction with a latent-diffusion inpainting restoration module (Stable Diffusion 2). We evaluate the approach using both privacy-oriented metrics, which quantify residual PHI and success of redaction, and image-quality and task-based metrics, which assess the fidelity of restored volumes for representative deep learning applications. Our results suggest that the proposed method yields de-identified medical images that are visually coherent, maintaining fidelity for downstream models, while substantially reducing the risk of patient re-identification. By automating anonymization and image reconstruction within a single workflow, and dissemination of large-scale medical imaging collections, thereby lowering a key barrier to data sharing and multi-institutional collaboration in medical imaging AI.