🤖 AI Summary
Existing AI models for cardiac ultrasound analysis predominantly rely on single-frame processing, lacking video-level spatiotemporal reasoning and guideline-driven measurement interpretation. Method: We propose the first clinical-guideline–aware intelligent agent framework for end-to-end echocardiographic video analysis. It employs a measurement feasibility prediction model to assess anatomical measurability and autonomously invoke specialized visual tools; integrates large language models with customized vision modules to jointly perform temporal localization, spatial quantification, and clinical report generation; and enables full-pipeline visual evidence tracing. Contribution/Results: Evaluated on a clinically validated video-query benchmark, our method significantly improves measurement accuracy and reporting compliance. It generates structured diagnostic reports fully aligned with ACC/AHA guidelines—clinically interpretable, traceable, and actionable.
📝 Abstract
Purpose: Echocardiographic interpretation requires video-level reasoning and guideline-based measurement analysis, which current deep learning models for cardiac ultrasound do not support. We present EchoAgent, a framework that enables structured, interpretable automation for this domain. Methods: EchoAgent orchestrates specialized vision tools under Large Language Model (LLM) control to perform temporal localization, spatial measurement, and clinical interpretation. A key contribution is a measurement-feasibility prediction model that determines whether anatomical structures are reliably measurable in each frame, enabling autonomous tool selection. We curated a benchmark of diverse, clinically validated video-query pairs for evaluation. Results: EchoAgent achieves accurate, interpretable results despite added complexity of spatiotemporal video analysis. Outputs are grounded in visual evidence and clinical guidelines, supporting transparency and traceability. Conclusion: This work demonstrates the feasibility of agentic, guideline-aligned reasoning for echocardiographic video analysis, enabled by task-specific tools and full video-level automation. EchoAgent sets a new direction for trustworthy AI in cardiac ultrasound.