🤖 AI Summary
This work addresses the challenge of simultaneously achieving coverage guarantees and tight prediction sets in safety-critical object detection under distribution shift. The authors propose a novel two-stage conformal prediction framework: first constructing class prediction sets using RAPS, then conditionally generating bounding boxes with coordinate-wise Bonferroni correction to ensure marginal coverage. To enhance tightness, they scale prediction intervals using input-adaptive uncertainty estimates derived from a probabilistic detector trained with loss-attenuation. This study presents the first systematic comparison of scaled versus unscaled conformal prediction in multi-class object detection. Experiments on KITTI, BDD, and CODA benchmarks demonstrate that the proposed method maintains theoretical coverage while improving IoU by up to 19%, reducing interval scores by 39%, and preserving coverage accuracy through class calibration with minimal sacrifice in tightness.
📝 Abstract
Conformal Prediction (CP) is a distribution-free method for constructing prediction sets with marginal finite-sample coverage guarantees, making it a suitable framework for reliable uncertainty quantification in safety-critical object detection. However, object detection introduces structured multi-output predictions, complicating the application of classical CP theory developed for single outputs. In addition, standard, unscaled CP produces fixed-width prediction intervals across inputs, leading to unnecessary width for low-uncertainty predictions. While scaled CP addresses this by adapting the interval width to an input-dependent uncertainty estimate, prior work has neither systematically compared unscaled and scaled CP for multi-class object detection, nor integrated CP with a complementary uncertainty quantification method in this setting. We fill this gap by: (i) applying CP coordinate-wise to bounding box corners with a Bonferroni correction for box-level guarantees; (ii) scaling the resulting intervals using per-prediction aleatoric uncertainty estimates derived from a probabilistic object detector trained with loss attenuation, evaluated in uncalibrated and two calibrated variants; (iii) extending to a two-step pipeline that constructs prediction sets for the class using RAPS and conditions the conformalized bounding boxes on the predicted class set. Across three autonomous driving datasets (KITTI, BDD, CODA), including a cross-domain setting under distribution shift, scaled CP consistently improves interval sharpness over unscaled CP, achieving up to 19% higher IoU and 39% lower interval scores, without sacrificing coverage. Class-wise calibration further improves coverage for both variants with a negligible effect on sharpness. Together, these improvements yield more actionable uncertainty estimates for real-time, real-world object detection.