Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

📅 2022-08-13
🏛️ ML4H@NeurIPS
📈 Citations: 9
Influential: 0
📄 PDF
🤖 AI Summary
This work exposes a latent threat to algorithmic fairness in healthcare machine learning: missing-data imputation can systematically exacerbate prediction bias against marginalized groups when the missingness mechanism is driven by societal biases. While standard imputation methods—including MICE, KNN, and GAIN—exhibit comparable overall predictive performance, they differentially distort outcomes across demographic subgroups. The study introduces the first causal framework linking clinical missingness mechanisms to group-specific missingness patterns. Through controlled simulations and empirical analysis on real-world electronic health records, it demonstrates that no single imputation strategy universally mitigates fairness disparities. Consequently, the authors propose fairness-aware imputation evaluation criteria and a subgroup-specific validation paradigm. These contributions advance transparency and accountability in ML preprocessing, offering principled guidance for equity-oriented data repair in clinical AI.
📝 Abstract
Biases have marked medical history, leading to unequal care affecting marginalised groups. The patterns of missingness in observational data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is too often an overlooked preprocessing step. When explicitly considered, attention is placed on overall performance, ignoring how this preprocessing can reinforce groupspecific inequities. Our work questions this choice by studying how imputation affects downstream algorithmic fairness. First, we provide a structured view of the relationship between clinical presence mechanisms and groupspecific missingness patterns. Then, through simulations and real-world experiments, we demonstrate that the imputation choice influences marginalised group performance and that no imputation strategy consistently reduces disparities. Importantly, our results show that current practices may endanger health equity as similarly performing imputation strategies at the population level can affect marginalised groups differently. Finally, we propose recommendations for mitigating inequities that may stem from a neglected step of the machine learning pipeline.
Problem

Research questions and friction points this paper is trying to address.

Impact of missing data on algorithmic fairness in healthcare
Evaluation of current imputation practices and their fairness implications
Proposal of a framework for fair imputation strategy selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes framework for guiding imputation choices
Highlights impact of missing data on fairness
Demonstrates utility with real-world datasets
🔎 Similar Papers
No similar papers found.
V
Vincent Jeanselme
MRC Biostatistics Unit, University of Cambridge, UK; The Alan Turing Institute
Maria De-Arteaga
Maria De-Arteaga
Associate Professor, ESADE Business School
Machine learningAlgorithmic fairnessHuman-centered MLHuman-AI collaboration
Z
Zhe Zhang
Rady School of Management, University of California, San Diego, USA
J
J. Barrett
MRC Biostatistics Unit, University of Cambridge, UK
B
Brian D. M. Tom
MRC Biostatistics Unit, University of Cambridge, UK