🤖 AI Summary
This work addresses the challenge of breast ultrasound image segmentation under extreme label scarcity, where existing semi-supervised methods often suffer from unstable pseudo-labels. The authors propose a training-free framework for pseudo-label generation and refinement that adapts a general-purpose vision-language model to medical images via appearance prompts, enabling cross-domain structural transfer. By integrating a static teacher with an exponential moving average teacher model, the approach employs uncertainty-aware entropy-weighted fusion and adaptive reverse contrastive learning to enhance boundary accuracy. Using only 2.5% labeled data, the method achieves performance close to fully supervised baselines across four breast ultrasound datasets, significantly outperforming current semi-supervised approaches while demonstrating strong cross-modal and cross-disease transferability.
📝 Abstract
Semi-supervised learning (SSL) has emerged as a promising paradigm for breast ultrasound (BUS) image segmentation, but it often suffers from unstable pseudo labels under extremely limited annotations, leading to inaccurate supervision and degraded performance. Recent vision-language models (VLMs) provide a new opportunity for pseudo-label generation, yet their effectiveness on BUS images remains limited because domain-specific prompts are difficult to transfer. To address this issue, we propose a semi-supervised framework with training-free pseudo-label generation and label refinement. By leveraging simple appearance-based descriptions (e.g., dark oval), our method enables cross-domain structural transfer between natural and medical images, allowing VLMs to generate structurally consistent pseudo labels. These pseudo labels are used to warm up a static teacher that captures global structural priors of breast lesions. Combined with an exponential moving average teacher, we further introduce uncertainty entropy weighted fusion and adaptive uncertainty-guided reverse contrastive learning to improve boundary discrimination. Experiments on four BUS datasets demonstrate that our method achieves performance comparable to fully supervised models even with only 2.5% labeled data, significantly outperforming existing SSL approaches. Moreover, the proposed paradigm is readily extensible: for other imaging modalities or diseases, only a global appearance description is required to obtain reliable pseudo supervision, enabling scalable semi-supervised medical image segmentation under limited annotations.