🤖 AI Summary
In medical image segmentation, fixed thresholds (e.g., 0.5) fail to provide statistical guarantees on the false negative rate (FNR), undermining reliability in high-stakes clinical applications. To address this, we introduce conformal prediction to 3D lesion segmentation for the first time, proposing a risk-constrained framework: it learns sample-adaptive thresholds from a calibration set, formulates an FNR-specific loss function, and determines test-time confidence thresholds via quantile estimation—ensuring FNR remains strictly below a user-specified tolerance. The method is agnostic to backbone architecture and is validated across six 3D lesion datasets and five state-of-the-art segmentation models. It significantly reduces FNR while preserving high segmentation accuracy—e.g., improving Dice scores by 1.2–3.8%—thereby enabling clinically deployable, risk-controlled, and interpretable segmentation with formal statistical guarantees.
📝 Abstract
Medical image segmentation serves as a critical component of precision medicine, enabling accurate localization and delineation of pathological regions, such as lesions. However, existing models empirically apply fixed thresholds (e.g., 0.5) to differentiate lesions from the background, offering no statistical guarantees on key metrics such as the false negative rate (FNR). This lack of principled risk control undermines their reliable deployment in high-stakes clinical applications, especially in challenging scenarios like 3D lesion segmentation (3D-LS). To address this issue, we propose a risk-constrained framework, termed Conformal Lesion Segmentation (CLS), that calibrates data-driven thresholds via conformalization to ensure the test-time FNR remains below a target tolerance $varepsilon$ under desired risk levels. CLS begins by holding out a calibration set to analyze the threshold setting for each sample under the FNR tolerance, drawing on the idea of conformal prediction. We define an FNR-specific loss function and identify the critical threshold at which each calibration data point just satisfies the target tolerance. Given a user-specified risk level $α$, we then determine the approximate $1-α$ quantile of all the critical thresholds in the calibration set as the test-time confidence threshold. By conformalizing such critical thresholds, CLS generalizes the statistical regularities observed in the calibration set to new test data, providing rigorous FNR constraint while yielding more precise and reliable segmentations. We validate the statistical soundness and predictive performance of CLS on six 3D-LS datasets across five backbone models, and conclude with actionable insights for deploying risk-aware segmentation in clinical practice.