VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High expertise requirements and a shortage of skilled sonographers hinder timely access to high-quality echocardiography in primary care settings. To address this, we propose the Parameter-Efficient Vision–Action Adapter (VA-Adapter), which transfers knowledge from pretrained multimodal foundation models to ultrasound probe guidance. VA-Adapter jointly encodes visual frames and historical action sequences, leveraging sequential reasoning to generate precise, real-time probe pose adjustment recommendations—requiring fine-tuning of only a small number of parameters. This significantly reduces reliance on expert experience, improves image acquisition quality by novice operators, and enhances inter-operator consistency. Experiments demonstrate that VA-Adapter outperforms strong existing baselines across multiple probe localization and pose control metrics. Our approach establishes a novel paradigm for deploying medical foundation models into clinical procedural assistance, bridging the gap between large-scale pretraining and practical, operator-level guidance in point-of-care ultrasound.

Technology Category

Application Category

📝 Abstract
Echocardiography is a critical tool for detecting heart diseases. Recently, ultrasound foundation models have demonstrated remarkable capabilities in cardiac ultrasound image analysis. However, obtaining high-quality ultrasound images is a prerequisite for accurate diagnosis. Due to the exceptionally high operational difficulty of cardiac ultrasound, there is a shortage of highly skilled personnel, which hinders patients from receiving timely examination services. In this paper, we aim to adapt the medical knowledge learned by foundation models from vast datasets to the probe guidance task, which is designed to provide real-time operational recommendations for junior sonographers to acquire high-quality ultrasound images. Moreover, inspired by the practice where experts optimize action decisions based on past explorations, we meticulously design a parameter-efficient Vision-Action Adapter (VA-Adapter) to enable foundation model's image encoder to encode vision-action sequences, thereby enhancing guidance performance. With built-in sequential reasoning capabilities in a compact design, the VA-Adapter enables a pre-trained ultrasound foundation model to learn precise probe adjustment strategies by fine-tuning only a small subset of parameters. Extensive experiments demonstrate that the VA-Adapter can surpass strong probe guidance models. Our code will be released after acceptance.
Problem

Research questions and friction points this paper is trying to address.

Adapting ultrasound foundation models for echocardiography probe guidance
Providing real-time operational recommendations to junior sonographers
Enhancing guidance performance through vision-action sequence encoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts ultrasound foundation model for probe guidance
Uses Vision-Action Adapter to encode vision-action sequences
Fine-tunes small parameter subset for probe adjustments
🔎 Similar Papers
No similar papers found.
T
Teng Wang
Department of Automation, BNRist, Tsinghua University, Beijing, China
H
Haojun Jiang
Department of Automation, BNRist, Tsinghua University, Beijing, China
Y
Yuxuan Wang
School of Computer Science and Technology, Xidian University, Xi’an, China
Z
Zhenguo Sun
Beijing Academy of Artificial Intelligence, Beijing, China
Shiji Song
Shiji Song
Tsinghua University
Modeling and optimizationcomplex systemand stochastic systems
G
Gao Huang
Department of Automation, BNRist, Tsinghua University, Beijing, China