🤖 AI Summary
This work addresses the limitations of existing fetal ultrasound interpretation methods, which typically follow a “one-task-one-model” paradigm and struggle to integrate multi-stage clinical evidence, while general-purpose multimodal large language models suffer from insufficient domain adaptation and hallucination risks. To overcome these challenges, the authors propose FetUSAgents—a tool-augmented multi-agent system that decomposes clinical queries into subtasks such as anatomical identification and quantitative measurement by collaboratively invoking specialized vision tools. Key innovations include a Dual-Path Evidence Arbitration (DPEA) mechanism that fuses large language model reasoning with structured visual evidence, a traceable retrieval-augmented evidence repository, and FetUS-VQA, the first fetal ultrasound–specific visual question answering benchmark. Experiments demonstrate that FetUSAgents outperforms the strongest baseline by over 25% in cross-distribution VQA accuracy, substantially enhancing interpretive reliability and clinical utility.
📝 Abstract
Automated fetal ultrasound interpretation requires a workflow from visual perception, including plane recognition and anatomical segmentation, to clinical understanding, including biometric measurement and diagnostic reporting. However, the prevailing "one-task, one-model" paradigm limits systematic integration of evidence across this multi-step process. Although multimodal large language models (MLLMs) show promising visual understanding, their limited domain-specific grounding and hallucination risks restrict reliability in fetal ultrasound analysis. To address these limitations, we propose FetUSAgents, a tool-augmented multi-agent system for comprehensive fetal ultrasound interpretation, supporting visual question answering (VQA), report generation, image captioning, and video summarization. FetUSAgents coordinates task-specific visual tools through collaborative LLM agents and decomposes clinical queries into subtasks that progress from anatomical recognition to quantitative measurement. We further introduce Dual-Path Evidence Arbitration (DPEA), which integrates LLM-based deliberative reasoning with structured computational evidence from specialized visual tools. A retrieval-enhanced evidence bank consolidates intermediate findings to support traceable and clinically grounded conclusions. In addition, we construct FetUS-VQA, a dedicated VQA benchmark for fetal ultrasound, comprising 1,892 images and 3,205 question-answer pairs across 10 clinical tasks. Extensive out-of-distribution experiments show that FetUSAgents outperforms general and medical MLLMs, exceeding the strongest baseline by more than 25 percent in VQA accuracy. These results suggest a scalable route toward evidence-driven clinical assistants for prenatal imaging. Code is available.