🤖 AI Summary
To address the limited clinical trust in ECG classification models stemming from their black-box nature, this paper proposes a three-stage knowledge transfer framework. First, a self-supervised joint embedding pretraining transfers multimodal clinical knowledge—such as laboratory test results and medical history—from MIMIC-IV-ECG into a unimodal ECG encoder. Second, abnormal lab value prediction serves as a proxy task to enforce physiologically grounded, interpretable modeling of ECG diagnostic outputs. Evaluated on multi-label diagnosis tasks, the method significantly outperforms signal-only baselines and approaches the performance of full multimodal models, while generating clinically substantiated explanations. Key contributions are: (i) the first approach to distill multimodal clinical knowledge into a unimodal ECG model without requiring multimodal inputs during inference; and (ii) leveraging measurable biomarkers as indirect, physiology-aligned anchors to disentangle and interpret model decision mechanisms—thereby bridging algorithmic interpretability with clinical utility.
📝 Abstract
Deep learning models have shown high accuracy in classifying electrocardiograms (ECGs), but their black box nature hinders clinical adoption due to a lack of trust and interpretability. To address this, we propose a novel three-stage training paradigm that transfers knowledge from multimodal clinical data (laboratory exams, vitals, biometrics) into a powerful, yet unimodal, ECG encoder. We employ a self-supervised, joint-embedding pre-training stage to create an ECG representation that is enriched with contextual clinical information, while only requiring the ECG signal at inference time. Furthermore, as an indirect way to explain the model's output we train it to also predict associated laboratory abnormalities directly from the ECG embedding. Evaluated on the MIMIC-IV-ECG dataset, our model outperforms a standard signal-only baseline in multi-label diagnosis classification and successfully bridges a substantial portion of the performance gap to a fully multimodal model that requires all data at inference. Our work demonstrates a practical and effective method for creating more accurate and trustworthy ECG classification models. By converting abstract predictions into physiologically grounded emph{explanations}, our approach offers a promising path toward the safer integration of AI into clinical workflows.