Investigating Demographic Bias in Brain MRI Segmentation: A Comparative Study of Deep-Learning and Non-Deep-Learning Methods

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep learning models for nucleus accumbens segmentation in brain MRI may exhibit demographic biases, yet their fairness across race and sex remains underexplored. Method: We systematically evaluated deep learning architectures (UNesT, nnU-Net, CoTr) against a non-deep learning atlas-based method (ANTs) using multicenter, multi-ethnic MRI data and expert-annotated ground truth. A linear mixed-effects model quantified the impact of demographic variables on segmentation accuracy and volumetric estimation; we further introduced novel, interpretable metrics for segmentation fairness. Contribution/Results: Race-matched training improved performance for some models, with nnU-Net demonstrating superior generalizability. All models preserved sex-related volume differences present in manual annotations but eliminated race-associated volumetric biases observed in raw clinical data. Our findings highlight the critical influence of training data composition on algorithmic fairness and provide both a methodological framework and empirical evidence for bias assessment and mitigation in medical imaging AI.

Technology Category

Application Category

📝 Abstract
Deep-learning-based segmentation algorithms have substantially advanced the field of medical image analysis, particularly in structural delineations in MRIs. However, an important consideration is the intrinsic bias in the data. Concerns about unfairness, such as performance disparities based on sensitive attributes like race and sex, are increasingly urgent. In this work, we evaluate the results of three different segmentation models (UNesT, nnU-Net, and CoTr) and a traditional atlas-based method (ANTs), applied to segment the left and right nucleus accumbens (NAc) in MRI images. We utilize a dataset including four demographic subgroups: black female, black male, white female, and white male. We employ manually labeled gold-standard segmentations to train and test segmentation models. This study consists of two parts: the first assesses the segmentation performance of models, while the second measures the volumes they produce to evaluate the effects of race, sex, and their interaction. Fairness is quantitatively measured using a metric designed to quantify fairness in segmentation performance. Additionally, linear mixed models analyze the impact of demographic variables on segmentation accuracy and derived volumes. Training on the same race as the test subjects leads to significantly better segmentation accuracy for some models. ANTs and UNesT show notable improvements in segmentation accuracy when trained and tested on race-matched data, unlike nnU-Net, which demonstrates robust performance independent of demographic matching. Finally, we examine sex and race effects on the volume of the NAc using segmentations from the manual rater and from our biased models. Results reveal that the sex effects observed with manual segmentation can also be observed with biased models, whereas the race effects disappear in all but one model.
Problem

Research questions and friction points this paper is trying to address.

Evaluating demographic bias in brain MRI segmentation algorithms
Comparing fairness of deep-learning and traditional segmentation methods
Analyzing race and sex effects on nucleus accumbens segmentation accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated deep-learning and atlas-based segmentation methods
Assessed demographic bias using race and sex subgroups
Measured fairness with quantitative metrics and linear models
🔎 Similar Papers
No similar papers found.