Calibrating Probabilistic Object Detectors with Annotator Disagreement

📅 2026-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first end-to-end method for calibrating probabilistic object detectors without access to ground-truth annotations, specifically targeting scenarios characterized by ambiguous targets and inter-annotator disagreement—common in medical imaging. By modeling the distribution of multiple annotators, the approach aligns classification confidence and bounding box variance with annotation uncertainty, incorporating a dedicated calibration loss and a post-processing mechanism. The study further introduces four novel metrics for evaluating calibration error in the absence of ground truth. Experimental results demonstrate that the proposed framework significantly enhances both the reliability of predictive uncertainty and detection performance of YOLO and two-stage detectors across medical and natural image datasets.
📝 Abstract
High degrees of disagreement among annotators can exist for ambiguous objects, e.g. in medical images, underscoring the challenges of establishing ground truth annotations in object detection tasks. Despite this, all existing object detectors implicitly require access to ground truth annotations for either training or evaluation. The fundamental questions we target are: How can we learn an object detector with multiple annotators' annotations but without objective ground truth annotations due to object ambiguity, and how can we enable the learned detector to express meaningful model predictive uncertainties in detecting ambiguous objects? To answer these questions, we present an interpretable approach to calibrate probabilistic object detectors, where the calibration goal is to align the class confidence and bounding box variance estimates to the annotators' annotation distribution. We introduce an efficient yet effective framework to calibrate probabilistic object detectors by designing four evaluation metrics to measure calibration errors regarding classification and localization, and proposing a train-time calibration and post-hoc calibrator, all without the need to access any ground truth. This framework is generalizable to many existing probabilistic object detectors, such as the YOLO families and two-stage detectors. Empirical results with real-world and synthetic datasets of medical and natural images demonstrate the superior performance of the proposed framework with three popular object detectors.
Problem

Research questions and friction points this paper is trying to address.

annotator disagreement
probabilistic object detection
calibration
ambiguous objects
ground truth uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

probabilistic object detection
annotator disagreement
calibration without ground truth
uncertainty quantification
bounding box variance