🤖 AI Summary
Synthetic electronic health record (EHR) data often exhibit insufficient fairness in downstream clinical prediction tasks, undermining equitable AI deployment in healthcare.
Method: We propose a task-agnostic, fairness-customizable synthetic EHR generation framework based on conditional generative adversarial networks (cGANs). It jointly models the true EHR distribution and user-specified group fairness constraints—e.g., statistical parity or equalized odds—via fairness-aware regularization to co-optimize fidelity and fairness.
Contribution/Results: This work introduces the first synthetic paradigm enabling configurable fairness objectives and decoupling the generator from downstream tasks, addressing the poor generalizability of existing fairness methods in health AI. Evaluated on two real-world EHR datasets across multiple clinical prediction tasks, our method reduces average fairness gaps (ΔDP/ΔEO) by up to 62% while incurring minimal AUC degradation (<1.2%), thus achieving a strong balance between fairness and clinical utility.
📝 Abstract
Among various aspects of ensuring the responsible design of AI tools for healthcare applications, addressing fairness concerns has been a key focus area. Specifically, given the wide spread of electronic health record (EHR) data and their huge potential to inform a wide range of clinical decision support tasks, improving fairness in this category of health AI tools is of key importance. While such a broad problem (mitigating fairness in EHR-based AI models) has been tackled using various methods, task- and model-agnostic methods are noticeably rare. In this study, we aimed to target this gap by presenting a new pipeline that generates synthetic EHR data, which is not only consistent with (faithful to) the real EHR data but also can reduce the fairness concerns (defined by the end-user) in the downstream tasks, when combined with the real data. We demonstrate the effectiveness of our proposed pipeline across various downstream tasks and two different EHR datasets. Our proposed pipeline can add a widely applicable and complementary tool to the existing toolbox of methods to address fairness in health AI applications, such as those modifying the design of a downstream model. The codebase for our project is available at https://github.com/healthylaife/FairSynth