🤖 AI Summary
This work addresses the significant performance degradation in medical image segmentation caused by extreme annotation scarcity—specifically, the scenario with only a single annotated image—by proposing the SemiSAM-O1 framework. The method employs a two-stage pipeline: first generating initial pseudo-labels through feature extraction and class prototype propagation using a foundation model encoder, then refining these labels via iterative self-training coupled with an uncertainty-guided optimization mechanism. Innovatively, it extends expert-generalist collaborative learning to the single-label extreme, circumvents limitations of foundation model prompting interfaces, and introduces a global feature similarity–based strategy to correct uncertain regions. Experiments demonstrate that SemiSAM-O1 substantially narrows the performance gap between single-label semi-supervised and fully supervised settings across diverse imaging modalities and anatomical structures, while significantly reducing online inference overhead.
📝 Abstract
Semi-supervised learning (SSL) has become a promising solution to alleviate the annotation burden of deep learning-based medical image segmentation models. While recent advances in foundation model-driven SSL have pushed the boundary to extremely limited annotation scenarios, they fail to maintain robust competitive performance in complex imaging modalities. In this paper, we propose SemiSAM-O1, an annotation-efficient framework using only one annotated template image for segmentation. SemiSAM-O1 extends the specialist-generalist collaborative learning framework to the extreme one-label setting by fully exploiting the foundation model's feature representation capability beyond its prompting interface. SemiSAM-O1 operates in two stages. In the first stage, the foundation model's encoder extracts dense features from all volumes, and class prototypes derived from the single annotated template are propagated to the unlabeled pool via feature similarity to produce coarse initial pseudo-labels. In the second stage, an iterative training-and-refinement loop progressively improves both the segmentation model and the pseudo-labels over multiple rounds, where each round trains the model from scratch on current pseudo-labels and generates updated predictions with voxel-wise uncertainty estimates. An uncertainty-guided refinement step further leverages the foundation model's global feature space to correct high-uncertainty regions by aggregating labels from their most similar confident neighbors, establishing a virtuous cycle of mutual improvement. Extensive experiments on a wide range of segmentation tasks across different modalities and anatomical targets demonstrate that SemiSAM-O1 significantly narrows the performance gap between one-label semi-supervised learning and full supervision, while significantly reducing the computational overhead of online foundation model inference.