U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Ultrasound image quality is highly susceptible to operator variability, noise, and anatomical heterogeneity, yet the capabilities of Large Vision-Language Models (LVLMs) on ultrasound imagery remain systematically unassessed. Method: We introduce the first comprehensive benchmark for LVLM-based ultrasound understanding—comprising eight clinically motivated tasks spanning 15 anatomical regions, 50 clinical scenarios, and 7,241 real-world cases, covering classification, detection, regression, and text generation. We establish a unified, open-source, multi-granularity evaluation framework specifically designed for dynamic, noise-sensitive, operator-dependent medical imaging. Contribution/Results: Evaluating 20 state-of-the-art LVLMs across models, tasks, and dimensions, we find strong performance on image-level classification but significant bottlenecks in spatial localization and clinical report generation. This benchmark fills a critical gap in ultrasound AI evaluation, providing a reproducible foundation and concrete directions for future advancement.

Technology Category

Application Category

📝 Abstract
Ultrasound is a widely-used imaging modality critical to global healthcare, yet its interpretation remains challenging due to its varying image quality on operators, noises, and anatomical structures. Although large vision-language models (LVLMs) have demonstrated impressive multimodal capabilities across natural and medical domains, their performance on ultrasound remains largely unexplored. We introduce U2-BENCH, the first comprehensive benchmark to evaluate LVLMs on ultrasound understanding across classification, detection, regression, and text generation tasks. U2-BENCH aggregates 7,241 cases spanning 15 anatomical regions and defines 8 clinically inspired tasks, such as diagnosis, view recognition, lesion localization, clinical value estimation, and report generation, across 50 ultrasound application scenarios. We evaluate 20 state-of-the-art LVLMs, both open- and closed-source, general-purpose and medical-specific. Our results reveal strong performance on image-level classification, but persistent challenges in spatial reasoning and clinical language generation. U2-BENCH establishes a rigorous and unified testbed to assess and accelerate LVLM research in the uniquely multimodal domain of medical ultrasound imaging.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LVLMs on ultrasound understanding tasks
Assessing performance across classification, detection, and text generation
Addressing challenges in spatial reasoning and clinical language generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

First benchmark for LVLMs on ultrasound
Evaluates 20 models across 50 scenarios
Tests classification, detection, and text generation
A
Anjie Le
Dolphin AI, University of Oxford
H
Henan Liu
Dolphin AI, Beihang University
Y
Yue Wang
Z
Zhenyu Liu
Dolphin AI
R
Rongkun Zhu
Hong Kong Baptist University
T
Taohan Weng
Dolphin AI, Beihang University
J
Jinze Yu
Dolphin AI, Beihang University
B
Boyang Wang
Dolphin AI, Beihang University
Y
Yalun Wu
Beihang University
K
Kaiwen Yan
Q
Quanlin Sun
University of Cambridge
M
Meirui Jiang
Dolphin AI, The Chinese University of Hong Kong
Jialun Pei
Jialun Pei
The Chinese University of Hong Kong
Deep LearningScene UnderstandingAI for HealthcareSurgical AI.
S
Siya Liu
Dolphin AI
H
Haoyun Zheng
Dolphin AI
Zhoujun Li
Zhoujun Li
Beihang University
Artificial IntelligentNatural Language ProcessingNetwork Security
Alison Noble
Alison Noble
Technikos Professor of Biomedical Engineering, University of Oxford, UK
Medical image analysismachine learning in medical imagingultrasoundfetal imaging
J
Jacques Souquet
Dolphin AI, Chinese Academy of Sciences
Xiaoqing Guo
Xiaoqing Guo
Assistant Professor in Hong Kong Baptist University, Visiting Fellow in University of Oxford
Medical Image AnalysisUltrasoundComputer Vision
Manxi Lin
Manxi Lin
Alibaba Group
ultrasound imaging
Hongcheng Guo
Hongcheng Guo
School of Data Science, Fudan University
LLMsMultimodal LLMs