🤖 AI Summary
This work addresses the critical issue of inconsistent label quality—both manual and pseudo-labels—in medical segmentation datasets, which severely undermines the reliability of model training and evaluation. To tackle this challenge, the authors propose SegAE, the first method to leverage a lightweight vision-language model (VLM) for large-scale assessment of medical segmentation label quality. Trained on over 4 million synthetic image–mask pairs, SegAE efficiently scores labels by modeling their correlation with Dice similarity. Evaluated across 142 anatomical structures, SegAE achieves a Pearson correlation of 0.902 with ground-truth Dice scores, while requiring only 0.06 seconds per 3D mask. This approach reduces annotation costs by one-third and cuts quality inspection time by 70%, further uncovering pervasive low-quality labels in multiple public datasets.
📝 Abstract
Large-scale medical segmentation datasets often combine manual and pseudo-labels of uneven quality, which can compromise training and evaluation. Low-quality labels may hamper performance and make the model training less robust. To address this issue, we propose SegAE (Segmentation Assessment Engine), a lightweight vision-language model (VLM) that automatically predicts label quality across 142 anatomical structures. Trained on over four million image-label pairs with quality scores, SegAE achieves a high correlation coefficient of 0.902 with ground-truth Dice similarity and evaluates a 3D mask in 0.06s. SegAE shows several practical benefits: (I) Our analysis reveals widespread low-quality labeling across public datasets; (II) SegAE improves data efficiency and training performance in active and semi-supervised learning, reducing dataset annotation cost by one-third and quality-checking time by 70% per label. This tool provides a simple and effective solution for quality control in large-scale medical segmentation datasets. The dataset, model weights, and codes are released at https://github.com/Schuture/SegAE.