๐ค AI Summary
Existing medical image de-identification methods struggle to simultaneously ensure strong privacy protection and preserve diagnostic semantic fidelity, while lacking adjustable privacy granularity. To address this, we propose a โdivide-and-conquerโ anonymization framework: first, a tunable-ratio identity-region masking mechanism enables fine-grained, controllable privacy enforcement; second, semantic features are extracted using a pre-trained medical foundation model, and a Minimum Description Length (MDL)-guided feature disentanglement strategy explicitly separates identity-related information from diagnostic semantics. This achieves effective decoupling of privacy-critical and clinically relevant attributes. Our method supports continuous, fine-grained privacy-level adjustment. Extensive evaluation across seven diverse medical imaging datasets and three downstream diagnostic tasks demonstrates consistent and significant improvements over state-of-the-art approaches, achieving the optimal trade-off between de-identification strength and diagnostic utility.
๐ Abstract
Medical imaging has significantly advanced computer-aided diagnosis, yet its re-identification (ReID) risks raise critical privacy concerns, calling for de-identification (DeID) techniques. Unfortunately, existing DeID methods neither particularly preserve medical semantics, nor are flexibly adjustable towards different privacy levels. To address these issues, we propose a divide-and-conquer framework comprising two steps: (1) Identity-Blocking, which blocks varying proportions of identity-related regions, to achieve different privacy levels; and (2) Medical-Semantics-Compensation, which leverages pre-trained Medical Foundation Models (MFMs) to extract medical semantic features to compensate the blocked regions. Moreover, recognizing that features from MFMs may still contain residual identity information, we introduce a Minimum Description Length principle-based feature decoupling strategy, to effectively decouple and discard such identity components. Extensive evaluations against existing approaches across seven datasets and three downstream tasks, demonstrates our state-of-the-art performance.