The Re-Label Method For Data-Centric Machine Learning

📅 2023-02-09
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
To address the challenge of noisy human-annotated data in industrial deep learning—where label noise severely degrades model performance—this paper proposes a lightweight, model-agnostic, closed-loop human-in-the-loop data cleaning method. The method innovatively identifies suspicious noisy labels by jointly leveraging model prediction confidence and cross-model or cross-iteration prediction consistency, then automatically triggers human review and relabeling without modifying model architecture or training pipelines. It is compatible with diverse tasks—including classification, sequence labeling, object detection, text generation, and CTR prediction—as well as mainstream annotation platforms. On development sets, it consistently achieves scores above 90. Human evaluation confirms high accuracy in noise identification, while extensive multi-task experiments demonstrate its effectiveness, generalizability, and scalability across domains and model types.
📝 Abstract
In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The dev dataset evaluation results and human evaluation results verify our idea.
Problem

Research questions and friction points this paper is trying to address.

Identify and correct noisy data in manually labeled datasets.
Improve model performance to achieve over 90% accuracy on dev datasets.
Apply the method across various deep learning tasks for validation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Re-label noisy data using model predictions
Human re-labeling based on model references
Applicable to various deep learning tasks
🔎 Similar Papers
No similar papers found.