Echo-CoPilot: A Multi-View, Multi-Task Agent for Echocardiography Interpretation and Reporting

📅 2025-12-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Echocardiogram interpretation heavily relies on expert expertise, yet existing models address only isolated subtasks and lack clinically coherent, end-to-end analytical capability. This work introduces the first multi-view, multi-task intelligent agent for end-to-end echocardiogram interpretation: an LLM-based ReAct reasoning framework that decomposes clinical queries into view identification, anatomical segmentation, quantitative measurement, disease classification, and report generation—orchestrating specialized tools to produce guideline-compliant structured outputs and narrative summaries. Its key innovation lies in a transparent, traceable multi-tool collaboration architecture, enabling the first closed-loop reasoning from raw echocardiographic video to clinical-grade structured reports, with physiological-context-aware decision-making for borderline cases. Evaluated on MIMIC-EchoQA, it achieves 50.8% accuracy—substantially outperforming general and biomedical multimodal baselines. Qualitative analysis confirms its ability to integrate quantitative metrics with physiological knowledge for clinically grounded interpretation.

Technology Category

Application Category

📝 Abstract
Echocardiography is central to contemporary cardiovascular care, but full-study interpretation remains a cognitively demanding, multi-view task that is still performed manually. While recent foundation models for echocardiography can achieve strong performance on individual perceptual subtasks such as view classification, segmentation, or disease prediction, they typically operate in isolation and do not provide a unified, clinically coherent assessment. In this work, we introduce Echo-CoPilot, a multi-view, multi-task agent that uses a large language model to orchestrate a suite of specialized echocardiography tools. Within a ReAct-style loop, the agent decomposes clinician queries, invokes tools for view recognition, cardiac structure segmentation, measurement and disease prediction, and report synthesis, and integrates their outputs into guideline-aware answers and narrative summaries. We evaluate Echo-CoPilot on the public MIMIC-EchoQA benchmark, where it achieves an accuracy of 50.8%, outperforming both general-purpose and biomedical video vision-language models. Qualitative analyses further show that the agent leverages quantitative measurements and physiologic context to resolve challenging cases near clinical decision thresholds, such as borderline left ventricular hypertrophy or pericardial effusion severity. The code will be released upon acceptance of the paper.
Problem

Research questions and friction points this paper is trying to address.

Automates multi-view echocardiography interpretation and reporting
Integrates isolated AI tools into a unified clinical assessment
Resolves borderline cardiac cases using quantitative measurements and context
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses large language model to orchestrate specialized echocardiography tools
Integrates multi-view recognition, segmentation, and disease prediction in unified workflow
Generates guideline-aware answers and narrative summaries via ReAct-style loop