🤖 AI Summary
This work addresses the limitations of existing multimodal large models for electrocardiogram (ECG) analysis, which lack multi-turn conversational capabilities, efficient edge deployment, and precise interpretation of critical clinical markers such as PQRST intervals, thereby failing to meet real-world clinical interaction demands. To bridge this gap, we propose ECG-Agent—the first ECG-oriented multi-turn dialogue agent capable of tool invocation—built upon a multimodal large language model architecture integrated with ECG-specific tools and a dedicated PQRST interval parsing module, enabling efficient on-device inference. We also introduce ECG-MTD, a novel multi-turn dialogue dataset collected from real clinical scenarios. Experimental results demonstrate that ECG-Agent significantly outperforms baseline models in response accuracy, tool-calling success rate, and hallucination control, with its lightweight edge-deployable variant achieving performance comparable to that of much larger models.
📝 Abstract
Recent advances in Multimodal Large Language Models have rapidly expanded to electrocardiograms, focusing on classification, report generation, and single-turn QA tasks. However, these models fall short in real-world scenarios, lacking multi-turn conversational ability, on-device efficiency, and precise understanding of ECG measurements such as the PQRST intervals. To address these limitations, we introduce ECG-Agent, the first LLM-based tool-calling agent for multi-turn ECG dialogue. To facilitate its development and evaluation, we also present ECG-Multi-Turn-Dialogue (ECG-MTD) dataset, a collection of realistic user-assistant multi-turn dialogues for diverse ECG lead configurations. We develop ECG-Agents in various sizes, from on-device capable to larger agents. Experimental results show that ECG-Agents outperform baseline ECG-LLMs in response accuracy. Furthermore, on-device agents achieve comparable performance to larger agents in various evaluations that assess response accuracy, tool-calling ability, and hallucinations, demonstrating their viability for real-world applications.