EchoVQA: Enabling Conversational Assistance for Point-of-Care Cardiac Ultrasound

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitations of point-of-care cardiac ultrasound, which is hindered by the high dependence on operator expertise for both image acquisition and interpretation. To overcome this challenge, the authors introduce EchoVQA, the first large-scale visual question answering dataset for cardiac ultrasound, incorporating both high-quality and suboptimal images from multiple probe types, along with probe-positioning guidance questions. The dataset integrates images from public sources (EchoNet-Dynamic and CAMUS) and handheld devices (Lumify and Clarius). Furthermore, the work proposes a parameter-efficient, multimodal learnable prompting method that achieves state-of-the-art performance across multiple benchmarks while substantially reducing the number of trainable parameters, thereby significantly improving the accuracy of non-expert operators in acquiring standard apical four-chamber views.
📝 Abstract
Point-of-care transthoracic echocardiography (TTE) enables cardiac assessment in virtually any clinical setting, yet its diagnostic utility remains constrained by the expertise required for image acquisition and interpretation. Visual question answering (VQA) offers a promising paradigm for bridging this expertise gap through interactive clinical assistance, but existing echocardiography VQA datasets are limited in scale, restricted to high-quality images, and only cover a few views. We introduce EchoVQA, the first large-scale VQA dataset for echocardiography, comprising 14,299 images and 74,819 question-answer pairs. The dataset integrates public sources (EchoNet-Dynamic, CAMUS) with our own point-of-care acquisitions from two handheld probes (Lumify, Clarius), spanning diverse views and including both high-quality and suboptimal images. Uniquely, EchoVQA includes acquisition guidance questions to help users optimize transducer positioning toward a diagnostic apical 4-chamber view for left ventricular ejection fraction estimation -- a challenging task for novice operators in point-of-care settings. We further develop a parameter-efficient method based on multimodal learnable prompts achieving state-of-the-art performance on most benchmarks, including EchoVQA, with significantly less trainable parameters than existing state-of-the-art approaches.
Problem

Research questions and friction points this paper is trying to address.

point-of-care echocardiography
visual question answering
acquisition guidance
expertise gap
suboptimal images
Innovation

Methods, ideas, or system contributions that make the work stand out.

EchoVQA
visual question answering
point-of-care ultrasound
multimodal prompts
parameter-efficient learning