From Cold Start to Active Learning: Embedding-Based Scan Selection for Medical Image Segmentation

📅 2026-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high annotation cost in medical image segmentation by proposing a cold-start sampling strategy that integrates foundation model embeddings with clustering, alongside an active learning framework that jointly leverages uncertainty (entropy) and spatial diversity. Innovatively, the cold-start phase incorporates automatic cluster number selection and proportional sampling to enhance the representativeness of the initial training set. During active learning, sample selection is guided by a combination of entropy-based uncertainty and spatial diversity to improve annotation efficiency. The method significantly outperforms existing approaches across three benchmarks—CheXmask, Montgomery, and SynthStrip—achieving a Dice coefficient as high as 0.950 and reducing the Hausdorff distance to as low as 6.38 mm.

Technology Category

Application Category

📝 Abstract
Accurate segmentation annotations are critical for disease monitoring, yet manual labeling remains a major bottleneck due to the time and expertise required. Active learning (AL) alleviates this burden by prioritizing informative samples for annotation, typically through a diversity-based cold-start phase followed by uncertainty-driven selection. We propose a novel cold-start sampling strategy that combines foundation-model embeddings with clustering, including automatic selection of the number of clusters and proportional sampling across clusters, to construct a diverse and representative initial training. This is followed by an uncertainty-based AL framework that integrates spatial diversity to guide sample selection. The proposed method is intuitive and interpretable, enabling visualization of the feature-space distribution of candidate samples. We evaluate our approach on three datasets spanning X-ray and MRI modalities. On the CheXmask dataset, the cold-start strategy outperforms random selection, improving Dice from 0.918 to 0.929 and reducing the Hausdorff distance from 32.41 to 27.66 mm. In the AL setting, combined entropy and diversity selection improves Dice from 0.919 to 0.939 and reduces the Hausdorff distance from 30.10 to 19.16 mm. On the Montgomery dataset, cold-start gains are substantial, with Dice improving from 0.928 to 0.950 and Hausdorff distance decreasing from 14.22 to 9.38 mm. On the SynthStrip dataset, cold-start selection slightly affects Dice but reduces the Hausdorff distance from 9.43 to 8.69 mm, while active learning improves Dice from 0.816 to 0.826 and reduces the Hausdorff distance from 7.76 to 6.38 mm. Overall, the proposed framework consistently outperforms baseline methods in low-data regimes, improving segmentation accuracy.
Problem

Research questions and friction points this paper is trying to address.

medical image segmentation
active learning
annotation bottleneck
cold start
sample selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

active learning
embedding-based sampling
foundation model
clustering
medical image segmentation
🔎 Similar Papers
No similar papers found.
D
Devon Levy
Department of Information Systems, University of Haifa, Israel
B
Bar Assayag
Department of Information Systems, University of Haifa, Israel
L
Laura Gaspar
Department of Medical Imaging Sciences, University of Haifa, Israel
Ilan Shimshoni
Ilan Shimshoni
Professor of Information Systems, University of Haifa
Computer VisionComputer GraphicsRoboticsArchaeology
B
Bella Specktor-Fadida
Department of Medical Imaging Sciences, University of Haifa, Israel