Exploring the interplay of label bias with subgroup size and separability: A case study in mammographic density classification

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the mechanistic impact of label bias—systematic mislabeling across subgroups—on fairness in medical AI, focusing on the interplay between subgroup size and separability under bias. Method: Using the EMBED dataset, we construct separable and inseparable pseudo-subgroups for a breast density binary classification task, incorporating manufacturer information for subgroup analysis. Contribution/Results: Label bias significantly distorts the model’s learned feature space; subgroup separability and size jointly modulate the strength of bias propagation. Critically, validation set label quality emerges as a key moderator of subgroup performance—particularly true positive rate (TPR): in separable subgroups, biased validation labels reduce TPR from 0.898 to 0.518. This work provides the first structural characterization of feature-learning imbalance under label bias, revealing how label imperfections interact with data geometry to undermine fairness. It establishes a novel framework for robust fairness evaluation in clinical AI, emphasizing the critical role of annotation quality and subgroup structure in bias amplification.

Technology Category

Application Category

📝 Abstract
Systematic mislabelling affecting specific subgroups (i.e., label bias) in medical imaging datasets represents an understudied issue concerning the fairness of medical AI systems. In this work, we investigated how size and separability of subgroups affected by label bias influence the learned features and performance of a deep learning model. Therefore, we trained deep learning models for binary tissue density classification using the EMory BrEast imaging Dataset (EMBED), where label bias affected separable subgroups (based on imaging manufacturer) or non-separable "pseudo-subgroups". We found that simulated subgroup label bias led to prominent shifts in the learned feature representations of the models. Importantly, these shifts within the feature space were dependent on both the relative size and the separability of the subgroup affected by label bias. We also observed notable differences in subgroup performance depending on whether a validation set with clean labels was used to define the classification threshold for the model. For instance, with label bias affecting the majority separable subgroup, the true positive rate for that subgroup fell from 0.898, when the validation set had clean labels, to 0.518, when the validation set had biased labels. Our work represents a key contribution toward understanding the consequences of label bias on subgroup fairness in medical imaging AI.
Problem

Research questions and friction points this paper is trying to address.

Investigates label bias impact on medical AI fairness
Explores subgroup size and separability effects on model performance
Analyzes feature representation shifts due to biased labels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning models for biased label analysis
Feature shifts depend on subgroup characteristics
Clean label validation improves subgroup performance
🔎 Similar Papers
No similar papers found.
E
Emma A. M. Stanley
Department of Biomedical Engineering, University of Calgary, Canada
Raghav Mehta
Raghav Mehta
Imperial College London
Medical Image AnalysisDeep learningMachine LearningResponsible AITrustworthy AI
M
Mélanie Roschewitz
Department of Computing, Imperial College London, UK
N
Nils D. Forkert
Hotchkiss Brain Institute, University of Calgary, Canada
Ben Glocker
Ben Glocker
Imperial College London
Medical Image AnalysisComputer VisionMachine Learning