Investigating Label Bias and Representational Sources of Age-Related Disparities in Medical Segmentation

📅 2025-11-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the root cause of age-related algorithmic bias in breast cancer image segmentation—whether it stems from annotation quality disparities or inherent imaging difficulty in younger patients. Method: Leveraging the MAMA-MIA dataset, we conduct controlled experiments, quantitative difficulty assessment, machine-generated label analysis, and bias auditing. Contribution/Results: We identify underrepresentation of young-case visual patterns—not annotation quality—as the primary driver of performance degradation (a “biased ruler” effect). We establish the first quantitative baseline for age-related bias in medical segmentation, demonstrating that simple sample-size balancing fails to mitigate performance gaps. Critically, systematic bias is learned and amplified when models are trained on biased auto-annotations. Based on these findings, we propose a systematic framework for diagnosing algorithmic bias in diagnostic medical segmentation, emphasizing the necessity of analyzing qualitative data distribution shifts—not just quantity—to ensure model fairness.

Technology Category

Application Category

📝 Abstract
Algorithmic bias in medical imaging can perpetuate health disparities, yet its causes remain poorly understood in segmentation tasks. While fairness has been extensively studied in classification, segmentation remains underexplored despite its clinical importance. In breast cancer segmentation, models exhibit significant performance disparities against younger patients, commonly attributed to physiological differences in breast density. We audit the MAMA-MIA dataset, establishing a quantitative baseline of age-related bias in its automated labels, and reveal a critical Biased Ruler effect where systematically flawed labels for validation misrepresent a model's actual bias. However, whether this bias originates from lower-quality annotations (label bias) or from fundamentally more challenging image characteristics remains unclear. Through controlled experiments, we systematically refute hypotheses that the bias stems from label quality sensitivity or quantitative case difficulty imbalance. Balancing training data by difficulty fails to mitigate the disparity, revealing that younger patient cases are intrinsically harder to learn. We provide direct evidence that systemic bias is learned and amplified when training on biased, machine-generated labels, a critical finding for automated annotation pipelines. This work introduces a systematic framework for diagnosing algorithmic bias in medical segmentation and demonstrates that achieving fairness requires addressing qualitative distributional differences rather than merely balancing case counts.
Problem

Research questions and friction points this paper is trying to address.

Investigating age-related bias sources in medical image segmentation algorithms
Analyzing label bias versus inherent image difficulty in breast cancer segmentation
Developing framework to diagnose algorithmic bias in medical segmentation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically refutes label quality bias hypotheses
Reveals biased labels amplify systemic disparities
Proposes framework addressing qualitative distributional differences
🔎 Similar Papers
No similar papers found.