π€ AI Summary
To address the high annotation cost and poor generalization under limited samples in skeletal segmentation of pediatric elbow/wrist ultrasound images, this paper proposes FlexICLβthe first visual in-context learning framework tailored for intra-video musculoskeletal ultrasound segmentation. Leveraging an innovative cross-frame image stitching strategy and a multi-augmentation training paradigm, FlexICL utilizes only 5% annotated frames as contextual prompts to enable accurate skeletal region prediction on unseen frames. The method integrates a Painter decoder with MAE-VQGAN-based representations, enabling zero-shot adaptation across diverse ultrasound domains without fine-tuning. Evaluated on four public datasets comprising 1,252 scans, FlexICL consistently outperforms U-Net, TransUNet, and state-of-the-art visual ICL approaches, achieving Dice score improvements of 1β27%. These results demonstrate its efficiency, robustness, and clinical applicability for low-resource musculoskeletal ultrasound analysis.
π Abstract
Elbow and wrist fractures are the most common fractures in pediatric populations. Automatic segmentation of musculoskeletal structures in ultrasound (US) can improve diagnostic accuracy and treatment planning. Fractures appear as cortical defects but require expert interpretation. Deep learning (DL) can provide real-time feedback and highlight key structures, helping lightly trained users perform exams more confidently. However, pixel-wise expert annotations for training remain time-consuming and costly. To address this challenge, we propose FlexICL, a novel and flexible in-context learning (ICL) framework for segmenting bony regions in US images. We apply it to an intra-video segmentation setting, where experts annotate only a small subset of frames, and the model segments unseen frames. We systematically investigate various image concatenation techniques and training strategies for visual ICL and introduce novel concatenation methods that significantly enhance model performance with limited labeled data. By integrating multiple augmentation strategies, FlexICL achieves robust segmentation performance across four wrist and elbow US datasets while requiring only 5% of the training images. It outperforms state-of-the-art visual ICL models like Painter, MAE-VQGAN, and conventional segmentation models like U-Net and TransUNet by 1-27% Dice coefficient on 1,252 US sweeps. These initial results highlight the potential of FlexICL as an efficient and scalable solution for US image segmentation well suited for medical imaging use cases where labeled data is scarce.