🤖 AI Summary
In medical image segmentation, existing deep learning models lack statistical guarantees for controlling the false positive rate (FPR), hindering their trustworthy deployment in clinical decision-making. To address this, we propose the first conformal prediction-based post-processing framework specifically designed for FPR control in segmentation—requiring no model retraining and enabling image-level FPR calibration for any pre-trained segmentation model. Our method generates nested, contracted masks (e.g., via threshold elevation or morphological erosion) from the model’s raw predictions and selects the optimal contraction parameter via conformal prediction on a calibration set to satisfy a user-specified FPR tolerance. The framework provides distribution-free, finite-sample statistical guarantees, and is plug-and-play, model-agnostic, and computationally efficient. Evaluated on a polyp segmentation benchmark, it achieves the target FPR with high probability, substantially enhancing segmentation reliability and clinical safety.
📝 Abstract
Reliable semantic segmentation is essential for clinical decision making, yet deep models rarely provide explicit statistical guarantees on their errors. We introduce a simple post-hoc framework that constructs confidence masks with distribution-free, image-level control of false-positive predictions. Given any pretrained segmentation model, we define a nested family of shrunken masks obtained either by increasing the score threshold or by applying morphological erosion. A labeled calibration set is used to select a single shrink parameter via conformal prediction, ensuring that, for new images that are exchangeable with the calibration data, the proportion of false positives retained in the confidence mask stays below a user-specified tolerance with high probability. The method is model-agnostic, requires no retraining, and provides finite-sample guarantees regardless of the underlying predictor. Experiments on a polyp-segmentation benchmark demonstrate target-level empirical validity. Our framework enables practical, risk-aware segmentation in settings where over-segmentation can have clinical consequences. Code at https://github.com/deel-ai-papers/conseco.