Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation

📅 2024-03-11

🏛️ International Conference on Medical Image Computing and Computer-Assisted Intervention

📈 Citations: 4

✨ Influential: 1

career value

206K/year

🤖 AI Summary

Medical image segmentation models often exhibit overconfident and miscalibrated pixel-wise predictions, hindering their trustworthy clinical deployment. To address this, we propose a differentiable edge-aware mean L1 calibration error (mL1-ACE) as an auxiliary loss—enabling the first end-to-end differentiable optimization of hard-binning calibration error. We further introduce a dataset-level reliability histogram, extending conventional reliability diagrams to semantic segmentation evaluation. Our method jointly improves calibration performance and model reliability without compromising segmentation accuracy: on BraTS 2021, it reduces mean and maximum calibration error by 45% and 55%, respectively, while maintaining a stable Dice score of 87%. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Deep neural networks for medical image segmentation often produce overconfident results misaligned with empirical observations. Such miscalibration, challenges their clinical translation. We propose to use marginal L1 average calibration error (mL1-ACE) as a novel auxiliary loss function to improve pixel-wise calibration without compromising segmentation quality. We show that this loss, despite using hard binning, is directly differentiable, bypassing the need for approximate but differentiable surrogate or soft binning approaches. Our work also introduces the concept of dataset reliability histograms which generalises standard reliability diagrams for refined visual assessment of calibration in semantic segmentation aggregated at the dataset level. Using mL1-ACE, we reduce average and maximum calibration error by 45% and 55% respectively, maintaining a Dice score of 87% on the BraTS 2021 dataset. We share our code here: https://github.com/cai4cai/ACE-DLIRIS

Problem

Research questions and friction points this paper is trying to address.

Overconfident results in medical image segmentation

Miscalibration challenges clinical translation of models

Need for improved pixel-wise calibration without quality loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

mL1-ACE loss improves pixel-wise calibration

Differentiable hard binning bypasses surrogate methods

Dataset reliability histograms enhance visual assessment

🔎 Similar Papers

No similar papers found.