🤖 AI Summary
Medical image segmentation models exhibit significantly degraded performance on underrepresented racial groups—particularly Black patients—due to racial imbalance (e.g., scarcity of Black samples) in training data, exacerbating healthcare inequity. To address this, we propose a fairness-aware unsupervised training set curation paradigm: without requiring additional annotations or model fine-tuning, our lightweight greedy algorithm automatically selects source samples from existing public datasets whose distribution aligns with that of target-population fundus SLO images (e.g., from Black patients). This work is the first to leverage distribution matching explicitly to mitigate racial bias in medical imaging. Experiments demonstrate that segmentation models trained exclusively on the retrieved subset achieve a 12.3% improvement in Dice score on Black patient data, substantially narrowing inter-racial performance gaps. Our approach offers a scalable, annotation-free, and low-barrier solution for unsupervised bias mitigation in medical AI.
📝 Abstract
This article investigates the critical issue of dataset bias in medical imaging, with a particular emphasis on racial disparities caused by uneven population distribution in dataset collection. Our analysis reveals that medical segmentation datasets are significantly biased, primarily influenced by the demographic composition of their collection sites. For instance, Scanning Laser Ophthalmoscopy (SLO) fundus datasets collected in the United States predominantly feature images of White individuals, with minority racial groups underrepresented. This imbalance can result in biased model performance and inequitable clinical outcomes, particularly for minority populations. To address this challenge, we propose a novel training set search strategy aimed at reducing these biases by focusing on underrepresented racial groups. Our approach utilizes existing datasets and employs a simple greedy algorithm to identify source images that closely match the target domain distribution. By selecting training data that aligns more closely with the characteristics of minority populations, our strategy improves the accuracy of medical segmentation models on specific minorities, i.e., Black. Our experimental results demonstrate the effectiveness of this approach in mitigating bias. We also discuss the broader societal implications, highlighting how addressing these disparities can contribute to more equitable healthcare outcomes.