Deep classification algorithm for De-identification of DICOM medical images

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study addresses the challenge of automating the removal of personally identifiable information (PII) and protected health information (PHI)—such as patient names, clinical histories, and institutional identifiers—embedded in both DICOM headers and pixel domains (e.g., burned-in text) of medical images, in compliance with HIPAA’s Safe Harbor standard. We propose an end-to-end de-identification framework integrating a configurable rule engine with a lightweight deep learning classifier, supporting multilingual OCR and context-aware sensitive field classification. A key innovation is the joint processing of pixel-domain OCR detection and structured DICOM metadata. Evaluated on multicenter clinical datasets, our system achieves >99.2% sensitivity in detecting sensitive information and 100% compliant redaction—substantially outperforming conventional regex-based approaches. The implementation is open-sourced, demonstrating high flexibility, clinical deployability, and extensibility for research applications.

Technology Category

Application Category

📝 Abstract

Background : De-identification of DICOM (Digital Imaging and Communi-cations in Medicine) files is an essential component of medical image research. Personal Identifiable Information (PII) and/or Personal Health Identifying Information (PHI) need to be hidden or removed due to legal reasons. According to the Health Insurance Portability and Accountability Act (HIPAA) and privacy rules, also full-face photographic images and any compa-rable images are direct identifiers and are considered protected health information that also need to be de-identified. Objective : The study aimed to implement a method that permit to de-identify the PII and PHI information present in the header and burned on the pixel data of DICOM. Methods : To execute the de-identification, we implemented an algorithm based on the safe harbor method, defined by HIPAA. Our algorithm uses input customizable parameter to classify and then possibly de-identify individual DICOM tags. Results : The most sensible information, like names, history, personal data and institution were successfully recognized. Conclusions : We developed a python algorithm that is able to classify infor-mation present in a DICOM file. The flexibility provided by the use of customi-zable input parameters, which allow the user to customize the entire process de-pending on the case (e.g., the language), makes the entire program very promis-ing for both everyday use and research purposes. Our code is available at https://github.com/rtdicomexplorer/deep_deidentification.

Problem

Research questions and friction points this paper is trying to address.

De-identify PII and PHI in DICOM headers and pixel data

Comply with HIPAA privacy rules for medical images

Classify sensitive DICOM tags using customizable parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep classification algorithm for DICOM de-identification

Customizable input parameters for flexible processing

Python-based solution for HIPAA compliance

🔎 Similar Papers

No similar papers found.