Adaptive Label Error Detection: A Bayesian Approach to Mislabeled Data Detection

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

179K/year
🤖 AI Summary
Label noise significantly degrades model performance, necessitating effective detection methods. This work proposes Adaptive Label Error Detection (ALED), which first extracts and denoises intermediate features from deep convolutional networks, then models each class as a multivariate Gaussian distribution on a low-dimensional manifold, and finally identifies mislabeled samples via a Bayesian likelihood ratio test. ALED is the first approach to integrate feature denoising, class-conditional Gaussian modeling, and likelihood ratio testing within a unified framework. Evaluated on multiple medical imaging datasets, it substantially outperforms existing techniques; fine-tuning models with labels corrected by ALED reduces test error rates by 33.8%, achieving both high sensitivity and precision.

Technology Category

Application Category

📝 Abstract
Machine learning classification systems are susceptible to poor performance when trained with incorrect ground truth labels, even when data is well-curated by expert annotators. As machine learning becomes more widespread, it is increasingly imperative to identify and correct mislabeling to develop more powerful models. In this work, we motivate and describe Adaptive Label Error Detection (ALED), a novel method of detecting mislabeling. ALED extracts an intermediate feature space from a deep convolutional neural network, denoises the features, models the reduced manifold of each class with a multidimensional Gaussian distribution, and performs a simple likelihood ratio test to identify mislabeled samples. We show that ALED has markedly increased sensitivity, without compromising precision, compared to established label error detection methods, on multiple medical imaging datasets. We demonstrate an example where fine-tuning a neural network on corrected data results in a 33.8% decrease in test set errors, providing strong benefits to end users. The ALED detector is deployed in the Python package statlab.
Problem

Research questions and friction points this paper is trying to address.

label error detection
mislabeled data
classification
data quality
machine learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Label Error Detection
Bayesian approach
feature denoising
Gaussian class modeling
likelihood ratio test
🔎 Similar Papers
No similar papers found.