From Label Error Detection to Correction: A Modular Framework and Benchmark for Object Detection Datasets

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

This work addresses pervasive labeling errors—such as missing annotations, misclassifications, and imprecise bounding boxes—in object detection datasets. We propose REC✓D, a semi-automated correction framework that leverages pre-trained detectors to generate candidate mislabeling suggestions, employs lightweight crowdsourced micro-tasks for independent human verification by multiple annotators, and applies response aggregation to quantify annotation ambiguity and enhance correction robustness. To our knowledge, REC✓D is the first scalable, end-to-end system for detecting and correcting labels in object detection data. As a key contribution, we release a rigorously validated high-quality subset of pedestrian annotations from KITTI as a new benchmark. Experiments demonstrate that existing methods fail to detect up to 66% of ground-truth labeling errors, whereas REC✓D identifies and rectifies at least 24% of original annotation errors at a fraction of the cost of full manual re-annotation, substantially improving dataset quality and model evaluation reliability.

Technology Category

Application Category

📝 Abstract

Object detection has advanced rapidly in recent years, driven by increasingly large and diverse datasets. However, label errors, defined as missing labels, incorrect classification or inaccurate localization, often compromise the quality of these datasets. This can have a significant impact on the outcomes of training and benchmark evaluations. Although several methods now exist for detecting label errors in object detection datasets, they are typically validated only on synthetic benchmarks or limited manual inspection. How to correct such errors systemically and at scale therefore remains an open problem. We introduce a semi-automated framework for label-error correction called REC$checkmark$D (Rechecked). Building on existing detectors, the framework pairs their error proposals with lightweight, crowd-sourced microtasks. These tasks enable multiple annotators to independently verify each candidate bounding box, and their responses are aggregated to estimate ambiguity and improve label quality. To demonstrate the effectiveness of REC$checkmark$D, we apply it to the class pedestrian in the KITTI dataset. Our crowdsourced review yields high-quality corrected annotations, which indicate a rate of at least 24% of missing and inaccurate annotations in original annotations. This validated set will be released as a new real-world benchmark for label error detection and correction. We show that current label error detection methods, when combined with our correction framework, can recover hundreds of errors in the time it would take a human to annotate bounding boxes from scratch. However, even the best methods still miss up to 66% of the true errors and with low quality labels introduce more errors than they find. This highlights the urgent need for further research, now enabled by our released benchmark.

Problem

Research questions and friction points this paper is trying to address.

Detect and correct label errors in object detection datasets

Improve label quality through crowdsourced verification

Address missing and inaccurate annotations in benchmark datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-automated framework for label-error correction

Pairs error proposals with crowd-sourced microtasks

Aggregates annotator responses to improve label quality

🔎 Similar Papers

CLEANANERCorp: Identifying and Correcting Incorrect Labels in the ANERcorp Dataset