🤖 AI Summary
This work addresses insufficient calibration of classification confidence in 3D object detectors for autonomous driving, with particular emphasis on holistic calibration across the full class distribution—including both dominant and long-tail classes. We propose a joint training-and-post-processing calibration framework: first explicitly modeling calibration of the entire class-wise prediction vector as a training objective via two auxiliary loss terms; then integrating isotonic regression and other post-processing techniques to enable end-to-end optimization on CenterPoint, PillarNet, and DSVT-Pillar. To rigorously evaluate calibration across diverse classes, we design multi-class calibration metrics. Experiments demonstrate significant improvements in calibration performance—reducing Expected Calibration Error (ECE) by 35%–52% for dominant and secondary classes on CenterPoint and PillarNet. However, our analysis reveals an inherent trade-off between dominant- and secondary-class calibration in DSVT-Pillar, uncovering a novel challenge for future research.
📝 Abstract
In autonomous systems, precise object detection and uncertainty estimation are critical for self-aware and safe operation. This work addresses confidence calibration for the classification task of 3D object detectors. We argue that it is necessary to regard the calibration of the full predictive confidence distribution over all classes and deduce a metric which captures the calibration of dominant and secondary class predictions. We propose two auxiliary regularizing loss terms which introduce either calibration of the dominant prediction or the full prediction vector as a training goal. We evaluate a range of post-hoc and train-time methods for CenterPoint, PillarNet and DSVT-Pillar and find that combining our loss term, which regularizes for calibration of the full class prediction, and isotonic regression lead to the best calibration of CenterPoint and PillarNet with respect to both dominant and secondary class predictions. We further find that DSVT-Pillar can not be jointly calibrated for dominant and secondary predictions using the same method.