Calibration improves detection of mislabeled examples

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
This study addresses the low accuracy and poor robustness of label noise detection in machine learning. We propose a model calibration–based framework for identifying mislabeled samples. Methodologically, we systematically validate and leverage model calibration to enhance the reliability of predicted probabilities, designing an instance-level trust score generation mechanism that integrates multiple calibration strategies—including temperature scaling and vector scaling—to refine base model outputs. Our key contributions are: (1) establishing a positive correlation between model calibration quality and mislabeling detection performance; (2) significantly improving the stability of trust scores across diverse noise types and intensities; and (3) achieving an average 12.6% improvement in detection accuracy on multiple real-world datasets, thereby effectively supporting data cleaning and relabeling. The approach combines theoretical rigor with practical deployability in industrial settings.

Technology Category

Application Category

📝 Abstract
Mislabeled data is a pervasive issue that undermines the performance of machine learning systems in real-world applications. An effective approach to mitigate this problem is to detect mislabeled instances and subject them to special treatment, such as filtering or relabeling. Automatic mislabeling detection methods typically rely on training a base machine learning model and then probing it for each instance to obtain a trust score that each provided label is genuine or incorrect. The properties of this base model are thus of paramount importance. In this paper, we investigate the impact of calibrating this model. Our empirical results show that using calibration methods improves the accuracy and robustness of mislabeled instance detection, providing a practical and effective solution for industrial applications.
Problem

Research questions and friction points this paper is trying to address.

Calibration enhances detection of mislabeled data instances
Mislabeled data undermines machine learning system performance
Calibration improves accuracy and robustness in mislabel detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Calibration enhances mislabel detection accuracy
Calibration boosts robustness in mislabel detection
Calibration provides practical industrial solution
🔎 Similar Papers
No similar papers found.