FlexICL: A Flexible Visual In-context Learning Framework for Elbow and Wrist Ultrasound Segmentation

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

To address the high annotation cost and poor generalization under limited samples in skeletal segmentation of pediatric elbow/wrist ultrasound images, this paper proposes FlexICL—the first visual in-context learning framework tailored for intra-video musculoskeletal ultrasound segmentation. Leveraging an innovative cross-frame image stitching strategy and a multi-augmentation training paradigm, FlexICL utilizes only 5% annotated frames as contextual prompts to enable accurate skeletal region prediction on unseen frames. The method integrates a Painter decoder with MAE-VQGAN-based representations, enabling zero-shot adaptation across diverse ultrasound domains without fine-tuning. Evaluated on four public datasets comprising 1,252 scans, FlexICL consistently outperforms U-Net, TransUNet, and state-of-the-art visual ICL approaches, achieving Dice score improvements of 1–27%. These results demonstrate its efficiency, robustness, and clinical applicability for low-resource musculoskeletal ultrasound analysis.

Technology Category

Application Category

📝 Abstract

Elbow and wrist fractures are the most common fractures in pediatric populations. Automatic segmentation of musculoskeletal structures in ultrasound (US) can improve diagnostic accuracy and treatment planning. Fractures appear as cortical defects but require expert interpretation. Deep learning (DL) can provide real-time feedback and highlight key structures, helping lightly trained users perform exams more confidently. However, pixel-wise expert annotations for training remain time-consuming and costly. To address this challenge, we propose FlexICL, a novel and flexible in-context learning (ICL) framework for segmenting bony regions in US images. We apply it to an intra-video segmentation setting, where experts annotate only a small subset of frames, and the model segments unseen frames. We systematically investigate various image concatenation techniques and training strategies for visual ICL and introduce novel concatenation methods that significantly enhance model performance with limited labeled data. By integrating multiple augmentation strategies, FlexICL achieves robust segmentation performance across four wrist and elbow US datasets while requiring only 5% of the training images. It outperforms state-of-the-art visual ICL models like Painter, MAE-VQGAN, and conventional segmentation models like U-Net and TransUNet by 1-27% Dice coefficient on 1,252 US sweeps. These initial results highlight the potential of FlexICL as an efficient and scalable solution for US image segmentation well suited for medical imaging use cases where labeled data is scarce.

Problem

Research questions and friction points this paper is trying to address.

Automating ultrasound segmentation for pediatric elbow and wrist fracture diagnosis

Reducing reliance on costly expert annotations for medical image analysis

Enhancing segmentation accuracy with limited labeled training data

Innovation

Methods, ideas, or system contributions that make the work stand out.

FlexICL framework enables in-context learning for ultrasound segmentation

Novel image concatenation methods enhance performance with limited data

Integrates augmentation strategies using only 5% labeled training images

🔎 Similar Papers

No similar papers found.