TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This work addresses the challenge of inconsistent expert annotations in segmenting pancreatic ductal adenocarcinoma (PDAC) on contrast-enhanced CT scans, a problem exacerbated by conventional deep learning approaches that assume a single ground truth, leading to poorly calibrated probabilistic outputs and limited interpretability. To overcome this, the authors propose TwinTrack, a novel framework that introduces posterior calibration into medical image segmentation for the first time. TwinTrack maps ensemble model outputs to the Mean Human Response (MHR)—the average of multiple expert annotations—enabling predicted probabilities to directly reflect inter-expert uncertainty. Requiring only a small number of multiply-annotated samples for calibration, the method achieves state-of-the-art performance on the MICCAI 2025 CURVAS-PDACVI benchmark, significantly enhancing both the reliability and clinical interpretability of segmentation outputs.

Technology Category

Application Category

📝 Abstract
Pancreatic ductal adenocarcinoma (PDAC) segmentation on contrast-enhanced CT is inherently ambiguous: inter-rater disagreement among experts reflects genuine uncertainty rather than annotation noise. Standard deep learning approaches assume a single ground truth, producing probabilistic outputs that can be poorly calibrated and difficult to interpret under such ambiguity. We present TwinTrack, a framework that addresses this gap through post-hoc calibration of ensemble segmentation probabilities to the empirical mean human response (MHR) -the fraction of expert annotators labeling a voxel as tumor. Calibrated probabilities are thus directly interpretable as the expected proportion of annotators assigning the tumor label, explicitly modeling inter-rater disagreement. The proposed post-hoc calibration procedure is simple and requires only a small multi-rater calibration set. It consistently improves calibration metrics over standard approaches when evaluated on the MICCAI 2025 CURVAS-PDACVI multi-rater benchmark.
Problem

Research questions and friction points this paper is trying to address.

medical image segmentation
inter-rater disagreement
uncertainty
calibration
pancreatic ductal adenocarcinoma
Innovation

Methods, ideas, or system contributions that make the work stand out.

post-hoc calibration
multi-rater disagreement
medical image segmentation
probabilistic interpretation
ensemble segmentation
🔎 Similar Papers
No similar papers found.
T
Tristan Kirscher
ICube Laboratory, CNRS UMR-7357, University of Strasbourg, Strasbourg, France; CLCC Institut-Strauss, Strasbourg, France
A
Alexandra Ertl
German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Heidelberg, Germany; Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital, Heidelberg, Germany; Medical Faculty Heidelberg, Heidelberg University, Heidelberg, Germany
K
Klaus Maier-Hein
German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Heidelberg, Germany; Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital, Heidelberg, Germany
Xavier Coubez
Xavier Coubez
Researcher - Institut de Cancérologie Strasbourg Europe
Particle physicsDeep learningMedicineCancerGenomics
P
Philippe Meyer
ICube Laboratory, CNRS UMR-7357, University of Strasbourg, Strasbourg, France; CLCC Institut-Strauss, Strasbourg, France
S
Sylvain Faisan
ICube Laboratory, CNRS UMR-7357, University of Strasbourg, Strasbourg, France