🤖 AI Summary
To address the clinical challenges of high manual burden and insufficient standardization in electrocardiogram (ECG) report generation, this paper proposes the first multimodal instruction-tuning framework specifically designed for ECG report generation. Methodologically, it integrates ECG time-series signal encoding, cross-modal alignment, and instruction tuning to enable end-to-end generation of structured clinical reports directly from raw signals. We introduce the first dedicated benchmark dataset for ECG report generation and validate our approach on over 800,000 real-world clinical reports. Key contributions include: (1) establishing the inaugural multimodal instruction-tuning paradigm for ECG analysis; (2) achieving signal–text representation alignment and zero-shot generalization; and (3) significantly improving report quality, clinical consistency, and robustness to signal noise—demonstrating strong potential for clinical deployment.
📝 Abstract
Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions. To facilitate future research, we establish a benchmark to evaluate MEIT with various LLMs backbones across two large-scale ECG datasets. Our approach uniquely aligns the representations of the ECG signal and the report, and we conduct extensive experiments to benchmark MEIT with nine open-source LLMs using more than 800,000 ECG reports. MEIT's results underscore the superior performance of instruction-tuned LLMs, showcasing their proficiency in quality report generation, zero-shot capabilities, and resilience to signal perturbation. These findings emphasize the efficacy of our MEIT framework and its potential for real-world clinical application.