🤖 AI Summary
Ultrasound imaging faces significant challenges—including operator dependency, substantial variability across devices and anatomical domains, prominent speckle noise, and scarce expert annotations—severely limiting the generalizability and label efficiency of AI models. To address these, we propose the first fully open-source, reproducible ultrasound foundation model. Our method introduces an adaptive masking framework that jointly leverages teacher-guided attention and student reconstruction loss to dynamically emphasize clinically salient regions; integrates contrastive learning with masked image modeling; adopts a Vision Mamba backbone to capture long-range spatial dependencies; and employs a progressive dynamic learning schedule to enhance pretraining robustness. Evaluated on our newly curated, publicly available ultrasound dataset of 308,000 images—the largest to date—the model demonstrates exceptional cross-domain transfer performance across multiple anatomical sites and disease tasks, enabling highly label-efficient downstream fine-tuning. The code, pretrained weights, and dataset are fully open-sourced.
📝 Abstract
Ultrasound (US) is one of the most widely used medical imaging modalities, thanks to its low cost, portability, real-time feedback, and absence of ionizing radiation. However, US image interpretation remains highly operator-dependent and varies significantly across anatomical regions, acquisition protocols, and device types. These variations, along with unique challenges such as speckle, low contrast, and limited standardized annotations, hinder the development of generalizable, label-efficient ultrasound AI models. In this paper, we propose OpenUS, the first reproducible, open-source ultrasound foundation model built on a large collection of public data. OpenUS employs a vision Mamba backbone, capturing both local and global long-range dependencies across the image. To extract rich features during pre-training, we introduce a novel self-adaptive masking framework that combines contrastive learning with masked image modeling. This strategy integrates the teacher's attention map with student reconstruction loss, adaptively refining clinically-relevant masking to enhance pre-training effectiveness. OpenUS also applies a dynamic learning schedule to progressively adjust the difficulty of the pre-training process. To develop the foundation model, we compile the largest to-date public ultrasound dataset comprising over 308K images from 42 publicly available datasets, covering diverse anatomical regions, institutions, imaging devices, and disease types. Our pre-trained OpenUS model can be easily adapted to specific downstream tasks by serving as a backbone for label-efficient fine-tuning. Code is available at https://github.com/XZheng0427/OpenUS.