Fairness-Optimized Synthetic EHR Generation for Arbitrary Downstream Predictive Tasks

📅 2024-06-04

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Synthetic electronic health record (EHR) data often exhibit insufficient fairness in downstream clinical prediction tasks, undermining equitable AI deployment in healthcare. Method: We propose a task-agnostic, fairness-customizable synthetic EHR generation framework based on conditional generative adversarial networks (cGANs). It jointly models the true EHR distribution and user-specified group fairness constraints—e.g., statistical parity or equalized odds—via fairness-aware regularization to co-optimize fidelity and fairness. Contribution/Results: This work introduces the first synthetic paradigm enabling configurable fairness objectives and decoupling the generator from downstream tasks, addressing the poor generalizability of existing fairness methods in health AI. Evaluated on two real-world EHR datasets across multiple clinical prediction tasks, our method reduces average fairness gaps (ΔDP/ΔEO) by up to 62% while incurring minimal AUC degradation (<1.2%), thus achieving a strong balance between fairness and clinical utility.

Technology Category

Application Category

📝 Abstract

Among various aspects of ensuring the responsible design of AI tools for healthcare applications, addressing fairness concerns has been a key focus area. Specifically, given the wide spread of electronic health record (EHR) data and their huge potential to inform a wide range of clinical decision support tasks, improving fairness in this category of health AI tools is of key importance. While such a broad problem (mitigating fairness in EHR-based AI models) has been tackled using various methods, task- and model-agnostic methods are noticeably rare. In this study, we aimed to target this gap by presenting a new pipeline that generates synthetic EHR data, which is not only consistent with (faithful to) the real EHR data but also can reduce the fairness concerns (defined by the end-user) in the downstream tasks, when combined with the real data. We demonstrate the effectiveness of our proposed pipeline across various downstream tasks and two different EHR datasets. Our proposed pipeline can add a widely applicable and complementary tool to the existing toolbox of methods to address fairness in health AI applications, such as those modifying the design of a downstream model. The codebase for our project is available at https://github.com/healthylaife/FairSynth

Problem

Research questions and friction points this paper is trying to address.

Generate synthetic EHR data to improve fairness

Address fairness concerns in downstream predictive tasks

Provide task- and model-agnostic fairness optimization method

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic EHR data for fairness

Task- and model-agnostic pipeline design

Combines real and synthetic data effectively

🔎 Similar Papers

No similar papers found.