🤖 AI Summary
Labeling bias leads to uneven error rates across subgroups, undermining the effectiveness of fairness constraints. To address this, this work presents the first systematic application of influence functions to detect labeling bias by efficiently estimating the impact of individual training samples on model predictions through influence scores derived from loss gradients and a diagonal Hessian approximation. This approach identifies mislabeled instances caused by annotation bias without relying on the strong assumption—common in conventional fairness methods—that labels are inherently trustworthy. Evaluated on MNIST, the method successfully detects nearly 90% of mislabeled samples; on the CheXpert dataset, consistently higher influence scores are observed for erroneously annotated instances, demonstrating the method’s efficacy and practical utility in real-world settings.
📝 Abstract
Labeling bias arises during data collection due to resource limitations or unconscious bias, leading to unequal label error rates across subgroups or misrepresentation of subgroup prevalence. Most fairness constraints assume training labels reflect the true distribution, rendering them ineffective when labeling bias is present; leaving a challenging question, that \textit{how can we detect such labeling bias?} In this work, we investigate whether influence functions can be used to detect labeling bias. Influence functions estimate how much each training sample affects a model's predictions by leveraging the gradient and Hessian of the loss function -- when labeling errors occur, influence functions can identify wrongly labeled samples in the training set, revealing the underlying failure mode. We develop a sample valuation pipeline and test it first on the MNIST dataset, then scaled to the more complex CheXpert medical imaging dataset. To examine label noise, we introduced controlled errors by flipping 20\% of the labels for one class in the dataset. Using a diagonal Hessian approximation, we demonstrated promising results, successfully detecting nearly 90\% of mislabeled samples in MNIST. On CheXpert, mislabeled samples consistently exhibit higher influence scores. These results highlight the potential of influence functions for identifying label errors.