🤖 AI Summary
This work addresses the challenge of scarce annotated data in few-shot medical image segmentation by proposing a novel framework leveraging self-supervised DINOv2 features. To mitigate the domain gap between natural-image pretraining and medical imaging, the method introduces two key components: WT-Aug, a wavelet-based feature augmentation module, and CG-Fuse, a context-guided fusion module. By integrating wavelet-domain feature enhancement with cross-attention mechanisms, the approach enables effective multi-scale contextual fusion at the feature level. Extensive experiments on six public datasets spanning five imaging modalities demonstrate that the proposed method significantly outperforms existing few-shot segmentation approaches, underscoring its robustness and generalization capability across diverse medical imaging domains.
📝 Abstract
Deep learning-based automatic medical image segmentation plays a critical role in clinical diagnosis and treatment planning but remains challenging in few-shot scenarios due to the scarcity of annotated training data. Recently, self-supervised foundation models such as DINOv3, which were trained on large natural image datasets, have shown strong potential for dense feature extraction that can help with the few-shot learning challenge. Yet, their direct application to medical images is hindered by domain differences. In this work, we propose DINO-AugSeg, a novel framework that leverages DINOv3 features to address the few-shot medical image segmentation challenge. Specifically, we introduce WT-Aug, a wavelet-based feature-level augmentation module that enriches the diversity of DINOv3-extracted features by perturbing frequency components, and CG-Fuse, a contextual information-guided fusion module that exploits cross-attention to integrate semantic-rich low-resolution features with spatially detailed high-resolution features. Extensive experiments on six public benchmarks spanning five imaging modalities, including MRI, CT, ultrasound, endoscopy, and dermoscopy, demonstrate that DINO-AugSeg consistently outperforms existing methods under limited-sample conditions. The results highlight the effectiveness of incorporating wavelet-domain augmentation and contextual fusion for robust feature representation, suggesting DINO-AugSeg as a promising direction for advancing few-shot medical image segmentation. Code and data will be made available on https://github.com/apple1986/DINO-AugSeg.