From Cold Start to Active Learning: Embedding-Based Scan Selection for Medical Image Segmentation

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the high annotation cost in medical image segmentation by proposing a cold-start sampling strategy that integrates foundation model embeddings with clustering, alongside an active learning framework that jointly leverages uncertainty (entropy) and spatial diversity. Innovatively, the cold-start phase incorporates automatic cluster number selection and proportional sampling to enhance the representativeness of the initial training set. During active learning, sample selection is guided by a combination of entropy-based uncertainty and spatial diversity to improve annotation efficiency. The method significantly outperforms existing approaches across three benchmarks—CheXmask, Montgomery, and SynthStrip—achieving a Dice coefficient as high as 0.950 and reducing the Hausdorff distance to as low as 6.38 mm.

Technology Category

Application Category

📝 Abstract

Accurate segmentation annotations are critical for disease monitoring, yet manual labeling remains a major bottleneck due to the time and expertise required. Active learning (AL) alleviates this burden by prioritizing informative samples for annotation, typically through a diversity-based cold-start phase followed by uncertainty-driven selection. We propose a novel cold-start sampling strategy that combines foundation-model embeddings with clustering, including automatic selection of the number of clusters and proportional sampling across clusters, to construct a diverse and representative initial training. This is followed by an uncertainty-based AL framework that integrates spatial diversity to guide sample selection. The proposed method is intuitive and interpretable, enabling visualization of the feature-space distribution of candidate samples. We evaluate our approach on three datasets spanning X-ray and MRI modalities. On the CheXmask dataset, the cold-start strategy outperforms random selection, improving Dice from 0.918 to 0.929 and reducing the Hausdorff distance from 32.41 to 27.66 mm. In the AL setting, combined entropy and diversity selection improves Dice from 0.919 to 0.939 and reduces the Hausdorff distance from 30.10 to 19.16 mm. On the Montgomery dataset, cold-start gains are substantial, with Dice improving from 0.928 to 0.950 and Hausdorff distance decreasing from 14.22 to 9.38 mm. On the SynthStrip dataset, cold-start selection slightly affects Dice but reduces the Hausdorff distance from 9.43 to 8.69 mm, while active learning improves Dice from 0.816 to 0.826 and reduces the Hausdorff distance from 7.76 to 6.38 mm. Overall, the proposed framework consistently outperforms baseline methods in low-data regimes, improving segmentation accuracy.

Problem

Research questions and friction points this paper is trying to address.

medical image segmentation

active learning

annotation bottleneck

cold start

sample selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

active learning

embedding-based sampling

foundation model