🤖 AI Summary
A publicly available, clinically deployable multimodal electrocardiogram (ECG) dataset—particularly one enabling precise synchronization across raw waveforms, diagnostic images, and explanatory clinical text—is currently lacking. Method: We introduce ECG-4M, the first large-scale, synchronously aligned four-modal ECG benchmark, derived from MIMIC-IV-ECG. It unifies raw ECG waveforms, high-resolution diagnostic images, beat-level quantitative parameters, and clinical interpretations generated by large language models, with cross-modal consistency ensured via unique identifiers. All data and preprocessing code are openly released. Contribution/Results: ECG-4M fills a critical gap in fine-grained multimodal cardiovascular AI research. It enables rigorous evaluation of model interpretability, facilitates clinical translation, and establishes a standardized benchmark for multimodal fusion, cross-modal reasoning, and trustworthy AI assessment in cardiology.
📝 Abstract
Electrocardiogram (ECG) plays a foundational role in modern cardiovascular care, enabling non-invasive diagnosis of arrhythmias, myocardial ischemia, and conduction disorders. While machine learning has achieved expert-level performance in ECG interpretation, the development of clinically deployable multimodal AI systems remains constrained, primarily due to the lack of publicly available datasets that simultaneously incorporate raw signals, diagnostic images, and interpretation text. Most existing ECG datasets provide only single-modality data or, at most, dual modalities, making it difficult to build models that can understand and integrate diverse ECG information in real-world settings. To address this gap, we introduce MEETI (MIMIC-IV-Ext ECG-Text-Image), the first large-scale ECG dataset that synchronizes raw waveform data, high-resolution plotted images, and detailed textual interpretations generated by large language models. In addition, MEETI includes beat-level quantitative ECG parameters extracted from each lead, offering structured parameters that support fine-grained analysis and model interpretability. Each MEETI record is aligned across four components: (1) the raw ECG waveform, (2) the corresponding plotted image, (3) extracted feature parameters, and (4) detailed interpretation text. This alignment is achieved using consistent, unique identifiers. This unified structure supports transformer-based multimodal learning and supports fine-grained, interpretable reasoning about cardiac health. By bridging the gap between traditional signal analysis, image-based interpretation, and language-driven understanding, MEETI established a robust foundation for the next generation of explainable, multimodal cardiovascular AI. It offers the research community a comprehensive benchmark for developing and evaluating ECG-based AI systems.