Performance Estimation for Supervised Medical Image Segmentation Models on Unlabeled Data Using UniverSeg

📅 2025-04-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In clinical practice, medical image segmentation models often face the challenge of unlabeled test data, hindering reliable performance assessment. To address this, we propose Segmentation Performance Evaluator (SPE), the first lightweight, plug-and-play unsupervised performance estimation method that accurately predicts six key metrics—including Dice and HD95—without ground-truth annotations. SPE integrates universal meta-learning representations from UniverSeg, uncertainty-aware regression, and cross-domain consistency constraints, enabling generalizable evaluation across diverse segmentation architectures and metrics with zero training overhead and seamless integration into existing pipelines. Evaluated on six public benchmarks, SPE achieves an average Pearson correlation coefficient of 0.956 ± 0.046 and a mean absolute error of 0.025 ± 0.019—significantly outperforming prior approaches. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
The performance of medical image segmentation models is usually evaluated using metrics like the Dice score and Hausdorff distance, which compare predicted masks to ground truth annotations. However, when applying the model to unseen data, such as in clinical settings, it is often impractical to annotate all the data, making the model's performance uncertain. To address this challenge, we propose the Segmentation Performance Evaluator (SPE), a framework for estimating segmentation models' performance on unlabeled data. This framework is adaptable to various evaluation metrics and model architectures. Experiments on six publicly available datasets across six evaluation metrics including pixel-based metrics such as Dice score and distance-based metrics like HD95, demonstrated the versatility and effectiveness of our approach, achieving a high correlation (0.956$pm$0.046) and low MAE (0.025$pm$0.019) compare with real Dice score on the independent test set. These results highlight its ability to reliably estimate model performance without requiring annotations. The SPE framework integrates seamlessly into any model training process without adding training overhead, enabling performance estimation and facilitating the real-world application of medical image segmentation algorithms. The source code is publicly available
Problem

Research questions and friction points this paper is trying to address.

Estimating segmentation model performance on unlabeled medical images
Adapting to various metrics and architectures without ground truth
Enabling reliable clinical application without annotation overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Estimates segmentation performance without annotations
Adaptable to various metrics and model architectures
Integrates seamlessly into model training processes
🔎 Similar Papers
No similar papers found.
Jingchen Zou
Jingchen Zou
Beijing University of Technology
Deep Learning
J
Jianqiang Li
College of Computer Science, Beijing University of Technology, Beijing, China
G
Gabriel Jimenez
Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inserm, AP-HP, Hôpital de la Pitié Salpêtrière, Paris, France
Q
Qing Zhao
College of Computer Science, Beijing University of Technology, Beijing, China
D
Daniel Racoceanu
Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié Salpêtrière, Paris, France
Matias Cosarinsky
Matias Cosarinsky
Universidad de Buenos Aires
Deep learning
Enzo Ferrante
Enzo Ferrante
CONICET & Universidad de Buenos Aires
Medical ImagingMachine LearningComputer VisionML Fairness
Guanghui Fu
Guanghui Fu
Sorbonne University, Institut du Cerveau-Paris Brain Institute, ARAMIS Lab
Medical image analysisComputer visionNatural language processingDeep learning