Large-Scale Label Quality Assessment for Medical Segmentation via a Vision-Language Judge and Synthetic Data

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the critical issue of inconsistent label quality—both manual and pseudo-labels—in medical segmentation datasets, which severely undermines the reliability of model training and evaluation. To tackle this challenge, the authors propose SegAE, the first method to leverage a lightweight vision-language model (VLM) for large-scale assessment of medical segmentation label quality. Trained on over 4 million synthetic image–mask pairs, SegAE efficiently scores labels by modeling their correlation with Dice similarity. Evaluated across 142 anatomical structures, SegAE achieves a Pearson correlation of 0.902 with ground-truth Dice scores, while requiring only 0.06 seconds per 3D mask. This approach reduces annotation costs by one-third and cuts quality inspection time by 70%, further uncovering pervasive low-quality labels in multiple public datasets.

Technology Category

Application Category

📝 Abstract

Large-scale medical segmentation datasets often combine manual and pseudo-labels of uneven quality, which can compromise training and evaluation. Low-quality labels may hamper performance and make the model training less robust. To address this issue, we propose SegAE (Segmentation Assessment Engine), a lightweight vision-language model (VLM) that automatically predicts label quality across 142 anatomical structures. Trained on over four million image-label pairs with quality scores, SegAE achieves a high correlation coefficient of 0.902 with ground-truth Dice similarity and evaluates a 3D mask in 0.06s. SegAE shows several practical benefits: (I) Our analysis reveals widespread low-quality labeling across public datasets; (II) SegAE improves data efficiency and training performance in active and semi-supervised learning, reducing dataset annotation cost by one-third and quality-checking time by 70% per label. This tool provides a simple and effective solution for quality control in large-scale medical segmentation datasets. The dataset, model weights, and codes are released at https://github.com/Schuture/SegAE.

Problem

Research questions and friction points this paper is trying to address.

label quality

medical segmentation

large-scale dataset

pseudo-labels

annotation quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language model

label quality assessment

medical image segmentation