🤖 AI Summary
Existing multimodal EHR models improve predictive performance but often exacerbate bias across patient subgroups, and the interplay between modalities in balancing fairness and accuracy remains poorly understood. To address this, we propose FAME—a Fairness-Aware Multimodal Embedding framework—that introduces a novel fairness-aware dynamic modality weighting mechanism based on per-modality fairness contribution, enabling equitable integration of clinical text, medical images, and structured diagnosis/procedure codes. We design EDDI (Equity-Difference Distance Index), a subgroup-agnostic fairness metric, coupled with a sign-agnostic aggregation method, enabling joint optimization of predictive accuracy and cross-subgroup fairness for the first time in multimodal EHR modeling. FAME employs a hybrid embedding architecture combining BEHRT and BioClinicalBERT to unify structured and unstructured data end-to-end. Experiments across multiple EHR prediction tasks show that FAME achieves an average AUC gain of 2.3% and reduces subgroup error disparity by 37%, demonstrating substantial improvements in both accuracy and fairness.
📝 Abstract
Electronic Health Record (EHR) data encompass diverse modalities -- text, images, and medical codes -- that are vital for clinical decision-making. To process these complex data, multimodal AI (MAI) has emerged as a powerful approach for fusing such information. However, most existing MAI models optimize for better prediction performance, potentially reinforcing biases across patient subgroups. Although bias-reduction techniques for multimodal models have been proposed, the individual strengths of each modality and their interplay in both reducing bias and optimizing performance remain underexplored. In this work, we introduce FAME (Fairness-Aware Multimodal Embeddings), a framework that explicitly weights each modality according to its fairness contribution. FAME optimizes both performance and fairness by incorporating a combined loss function. We leverage the Error Distribution Disparity Index (EDDI) to measure fairness across subgroups and propose a sign-agnostic aggregation method to balance fairness across subgroups, ensuring equitable model outcomes. We evaluate FAME with BEHRT and BioClinicalBERT, combining structured and unstructured EHR data, and demonstrate its effectiveness in terms of performance and fairness compared with other baselines across multiple EHR prediction tasks.