π€ AI Summary
Current intelligent echocardiography analysis methods often rely on superficial template-based shortcuts, leading to unreliable explanations that hinder clinical decision support. To address this limitation, this work proposes EchoTrust, a novel Actor-Verifier dual-agent framework specifically designed for echocardiographic interpretation. In this architecture, the Actor generates structured intermediate representations, while the Verifier evaluates these outputs against visual and textual evidence, enabling collaborative, traceable, and structured reasoning within a vision-language model. This approach significantly enhances model reliability and transparency in high-stakes clinical settings, effectively mitigating dependence on spurious surface cues and improving the robustness and clinical applicability of echocardiography question-answering systems.
π Abstract
Echocardiography plays an important role in the screening and diagnosis of cardiovascular diseases. However, automated intelligent analysis of echocardiographic data remains challenging due to complex cardiac dynamics and strong view heterogeneity. In recent years, visual language models (VLM) have opened a new avenue for building ultrasound understanding systems for clinical decision support. Nevertheless, most existing methods formulate this task as a direct mapping from video and question to answer, making them vulnerable to template shortcuts and spurious explanations. To address these issues, we propose EchoTrust, an evidence-driven Actor-Verifier framework for trustworthy reasoning in echocardiography VLM-based agents. EchoTrust produces a structured intermediate representation that is subsequently analyzed by distinct roles, enabling more reliable and interpretable decision-making for high-stakes clinical applications.