Towards Reliable Fetal Ultrasound Interpretation with Multi-Agent Collaboration

📅 2026-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing fetal ultrasound interpretation methods, which typically follow a “one-task-one-model” paradigm and struggle to integrate multi-stage clinical evidence, while general-purpose multimodal large language models suffer from insufficient domain adaptation and hallucination risks. To overcome these challenges, the authors propose FetUSAgents—a tool-augmented multi-agent system that decomposes clinical queries into subtasks such as anatomical identification and quantitative measurement by collaboratively invoking specialized vision tools. Key innovations include a Dual-Path Evidence Arbitration (DPEA) mechanism that fuses large language model reasoning with structured visual evidence, a traceable retrieval-augmented evidence repository, and FetUS-VQA, the first fetal ultrasound–specific visual question answering benchmark. Experiments demonstrate that FetUSAgents outperforms the strongest baseline by over 25% in cross-distribution VQA accuracy, substantially enhancing interpretive reliability and clinical utility.
📝 Abstract
Automated fetal ultrasound interpretation requires a workflow from visual perception, including plane recognition and anatomical segmentation, to clinical understanding, including biometric measurement and diagnostic reporting. However, the prevailing "one-task, one-model" paradigm limits systematic integration of evidence across this multi-step process. Although multimodal large language models (MLLMs) show promising visual understanding, their limited domain-specific grounding and hallucination risks restrict reliability in fetal ultrasound analysis. To address these limitations, we propose FetUSAgents, a tool-augmented multi-agent system for comprehensive fetal ultrasound interpretation, supporting visual question answering (VQA), report generation, image captioning, and video summarization. FetUSAgents coordinates task-specific visual tools through collaborative LLM agents and decomposes clinical queries into subtasks that progress from anatomical recognition to quantitative measurement. We further introduce Dual-Path Evidence Arbitration (DPEA), which integrates LLM-based deliberative reasoning with structured computational evidence from specialized visual tools. A retrieval-enhanced evidence bank consolidates intermediate findings to support traceable and clinically grounded conclusions. In addition, we construct FetUS-VQA, a dedicated VQA benchmark for fetal ultrasound, comprising 1,892 images and 3,205 question-answer pairs across 10 clinical tasks. Extensive out-of-distribution experiments show that FetUSAgents outperforms general and medical MLLMs, exceeding the strongest baseline by more than 25 percent in VQA accuracy. These results suggest a scalable route toward evidence-driven clinical assistants for prenatal imaging. Code is available.
Problem

Research questions and friction points this paper is trying to address.

fetal ultrasound interpretation
multi-agent collaboration
evidence integration
domain-specific grounding
hallucination risks
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent system
tool-augmented LLM
Dual-Path Evidence Arbitration
fetal ultrasound interpretation
evidence-driven reasoning
🔎 Similar Papers
No similar papers found.
X
Xiaotian Hu
Tsinghua University, Beijing, China
Mingxuan Liu
Mingxuan Liu
Tsinghua University
Deep LearningNeuroimagingMedical Image AnalysisSpiking Neural NetworksBiomedical Engineering
J
Junwei Huang
Tsinghua University, Beijing, China
K
Kasidit Anmahapong
Tsinghua University, Beijing, China
Yifei Chen
Yifei Chen
Tsinghua University
Artificial IntelligenceMedical Image AnalysisMultimodalLarge ModelAI for Medical
Yiming Huang
Yiming Huang
UCSD
Natural Language Processing
X
Xuguang Bai
Tsinghua University, Beijing, China
Zihan Li
Zihan Li
University of Washington
Foundation ModelAI for HealthcareMultimodal Learning
H
Hongjia Yang
Tsinghua University, Beijing, China
Y
Yingqi Hao
Tsinghua University, Beijing, China
H
Hong Xu
West China Second University Hospital, Sichuan University, Chengdu, China
Y
Yu Jiang
West China Second University Hospital, Sichuan University, Chengdu, China
T
Tian Tian
West China Second University Hospital, Sichuan University, Chengdu, China
Yi Liao
Yi Liao
Griffith University
Computer VisionDeep learningImage ProcessingData MiningProcess Mining
H
Haibo Qu
West China Second University Hospital, Sichuan University, Chengdu, China
Qiyuan Tian
Qiyuan Tian
Tsinghua University, Stanford University, Massachusetts General Hospital, Harvard Medical School
MRIDiffusion MRINeuroimagingDeep Learning