Exploring the interplay of label bias with subgroup size and separability: A case study in mammographic density classification

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This study investigates the mechanistic impact of label bias—systematic mislabeling across subgroups—on fairness in medical AI, focusing on the interplay between subgroup size and separability under bias. Method: Using the EMBED dataset, we construct separable and inseparable pseudo-subgroups for a breast density binary classification task, incorporating manufacturer information for subgroup analysis. Contribution/Results: Label bias significantly distorts the model’s learned feature space; subgroup separability and size jointly modulate the strength of bias propagation. Critically, validation set label quality emerges as a key moderator of subgroup performance—particularly true positive rate (TPR): in separable subgroups, biased validation labels reduce TPR from 0.898 to 0.518. This work provides the first structural characterization of feature-learning imbalance under label bias, revealing how label imperfections interact with data geometry to undermine fairness. It establishes a novel framework for robust fairness evaluation in clinical AI, emphasizing the critical role of annotation quality and subgroup structure in bias amplification.

Technology Category

Application Category

📝 Abstract

Systematic mislabelling affecting specific subgroups (i.e., label bias) in medical imaging datasets represents an understudied issue concerning the fairness of medical AI systems. In this work, we investigated how size and separability of subgroups affected by label bias influence the learned features and performance of a deep learning model. Therefore, we trained deep learning models for binary tissue density classification using the EMory BrEast imaging Dataset (EMBED), where label bias affected separable subgroups (based on imaging manufacturer) or non-separable "pseudo-subgroups". We found that simulated subgroup label bias led to prominent shifts in the learned feature representations of the models. Importantly, these shifts within the feature space were dependent on both the relative size and the separability of the subgroup affected by label bias. We also observed notable differences in subgroup performance depending on whether a validation set with clean labels was used to define the classification threshold for the model. For instance, with label bias affecting the majority separable subgroup, the true positive rate for that subgroup fell from 0.898, when the validation set had clean labels, to 0.518, when the validation set had biased labels. Our work represents a key contribution toward understanding the consequences of label bias on subgroup fairness in medical imaging AI.

Problem

Research questions and friction points this paper is trying to address.

Investigates label bias impact on medical AI fairness

Explores subgroup size and separability effects on model performance

Analyzes feature representation shifts due to biased labels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning models for biased label analysis

Feature shifts depend on subgroup characteristics

Clean label validation improves subgroup performance

🔎 Similar Papers

Bias Assessment and Data Drift Detection in Medical Image Analysis: A Survey