OpenUS: A Fully Open-Source Foundation Model for Ultrasound Image Analysis via Self-Adaptive Masked Contrastive Learning

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Ultrasound imaging faces significant challenges—including operator dependency, substantial variability across devices and anatomical domains, prominent speckle noise, and scarce expert annotations—severely limiting the generalizability and label efficiency of AI models. To address these, we propose the first fully open-source, reproducible ultrasound foundation model. Our method introduces an adaptive masking framework that jointly leverages teacher-guided attention and student reconstruction loss to dynamically emphasize clinically salient regions; integrates contrastive learning with masked image modeling; adopts a Vision Mamba backbone to capture long-range spatial dependencies; and employs a progressive dynamic learning schedule to enhance pretraining robustness. Evaluated on our newly curated, publicly available ultrasound dataset of 308,000 images—the largest to date—the model demonstrates exceptional cross-domain transfer performance across multiple anatomical sites and disease tasks, enabling highly label-efficient downstream fine-tuning. The code, pretrained weights, and dataset are fully open-sourced.

Technology Category

Application Category

📝 Abstract
Ultrasound (US) is one of the most widely used medical imaging modalities, thanks to its low cost, portability, real-time feedback, and absence of ionizing radiation. However, US image interpretation remains highly operator-dependent and varies significantly across anatomical regions, acquisition protocols, and device types. These variations, along with unique challenges such as speckle, low contrast, and limited standardized annotations, hinder the development of generalizable, label-efficient ultrasound AI models. In this paper, we propose OpenUS, the first reproducible, open-source ultrasound foundation model built on a large collection of public data. OpenUS employs a vision Mamba backbone, capturing both local and global long-range dependencies across the image. To extract rich features during pre-training, we introduce a novel self-adaptive masking framework that combines contrastive learning with masked image modeling. This strategy integrates the teacher's attention map with student reconstruction loss, adaptively refining clinically-relevant masking to enhance pre-training effectiveness. OpenUS also applies a dynamic learning schedule to progressively adjust the difficulty of the pre-training process. To develop the foundation model, we compile the largest to-date public ultrasound dataset comprising over 308K images from 42 publicly available datasets, covering diverse anatomical regions, institutions, imaging devices, and disease types. Our pre-trained OpenUS model can be easily adapted to specific downstream tasks by serving as a backbone for label-efficient fine-tuning. Code is available at https://github.com/XZheng0427/OpenUS.
Problem

Research questions and friction points this paper is trying to address.

Developing generalizable AI models for ultrasound image analysis across diverse conditions
Addressing operator dependency and variability in ultrasound interpretation
Overcoming limited annotations and unique ultrasound imaging challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Mamba backbone captures long-range dependencies
Self-adaptive masking combines contrastive and reconstruction learning
Dynamic learning schedule adjusts pre-training difficulty progressively
🔎 Similar Papers
No similar papers found.
Xiaoyu Zheng
Xiaoyu Zheng
DERI-Queen Mary University of London
X
Xu Chen
Digital Environment Research Institute (DERI), Queen Mary University of London
Awais Rauf
Awais Rauf
Digital Environment Research Institute (DERI), Queen Mary University of London
Qifan Fu
Qifan Fu
PhD student, Queen Mary University of London
Computer VisionSignal Processing
B
Benedetta Monosi
The William Harvey Research Institute, Queen Mary University of London
F
F. Rivellese
The William Harvey Research Institute, Queen Mary University of London
M
Myles J. Lewis
The William Harvey Research Institute, Queen Mary University of London
Shaogang Gong
Shaogang Gong
Queen Mary University of London
Computer VisionMachine LearningObject RecognitionAction RecognitionVideo Analysis
G
Gregory G. Slabaugh
Digital Environment Research Institute (DERI), Queen Mary University of London