Dolphin v1.0 Technical Report

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Ultrasound diagnosis has long suffered from operator dependency, image noise, and real-time inference constraints, hindering clinical deployment of AI models. To address these challenges, we introduce Dolphin v1.0—the first ultrasound-specific multimodal foundation model—and its reasoning-enhanced variant, Dolphin R1. We propose a novel unified training paradigm integrating textbook knowledge, public ultrasound data, synthetic samples, and general-domain corpora. Leveraging a three-stage training strategy—domain-specific pretraining, instruction alignment, and reinforcement learning–based refinement—augmented by an ultrasound-tailored reward mechanism, Dolphin R1 significantly improves diagnostic reasoning, output consistency, and interpretability. On the U2-Bench benchmark, Dolphin R1 achieves a U2-score of 0.5835, more than doubling the performance of prior state-of-the-art methods. This marks the first empirical validation of a unified multimodal framework for complex, dynamic medical imaging, establishing a new benchmark for ultrasound AI.

Technology Category

Application Category

📝 Abstract
Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound's complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultrasound foundation models unifying diverse clinical tasks in a single vision-language framework.To tackle ultrasound variability and noise, we curated a 2-million-scale multimodal dataset, combining textbook knowledge, public data, synthetic samples, and general corpora. This ensures robust perception, generalization, and clinical adaptability.The Dolphin series employs a three-stage training strategy: domain-specialized pretraining, instruction-driven alignment, and reinforcement-based refinement. Dolphin v1.0 delivers reliable performance in classification, detection, regression, and report generation. Dolphin R1 enhances diagnostic inference, reasoning transparency, and interpretability through reinforcement learning with ultrasound-specific rewards.Evaluated on U2-Bench across eight ultrasound tasks, Dolphin R1 achieves a U2-score of 0.5835-over twice the second-best model (0.2968) setting a new state of the art. Dolphin v1.0 also performs competitively, validating the unified framework. Comparisons show reasoning-enhanced training significantly improves diagnostic accuracy, consistency, and interpretability, highlighting its importance for high-stakes medical AI.
Problem

Research questions and friction points this paper is trying to address.

Addressing ultrasound's operator dependence and image noise challenges
Unifying diverse clinical ultrasound tasks in single framework
Enhancing diagnostic accuracy and interpretability for medical AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large multimodal ultrasound foundation model unifies clinical tasks
Three-stage training strategy with reinforcement learning refinement
Uses curated multimodal dataset for robust perception and generalization
🔎 Similar Papers
No similar papers found.
T
Taohan Weng
Dolphin AI
C
Chi Zhang
Dolphin AI
C
Chaoran Yan
Dolphin AI
S
Siya Liu
Dolphin AI
X
Xiaoyang Liu
Dolphin AI
Y
Yalun Wu
Dolphin AI
B
Boyang Wang
Dolphin AI
B
Boyan Wang
Dolphin AI
J
Jiren Ren
Dolphin AI
K
Kaiwen Yan
Dolphin AI
J
Jinze Yu
Dolphin AI
K
Kaibing Hu
Dolphin AI
H
Henan Liu
Dolphin AI
H
Haoyun Zheng
Dolphin AI
A
Anjie Le
Dolphin AI
Hongcheng Guo
Hongcheng Guo
School of Data Science, Fudan University
LLMsMultimodal LLMs