🤖 AI Summary
Existing multimodal ECG representation learning methods struggle with report-signal alignment and heavily rely on complete 12-lead recordings, limiting applicability in resource-constrained settings. Method: We propose a knowledge-enhanced dynamic multimodal alignment framework featuring: (1) a novel lead-agnostic dynamic masking mechanism enabling flexible input of arbitrary lead combinations; (2) structured clinical knowledge distillation from free-text reports via large language models (LLMs), integrated into the ECG encoder; and (3) a joint optimization objective combining lead-aware contrastive learning and cross-modal alignment. Results: Our method achieves state-of-the-art zero-shot classification and linear-probe performance across six external ECG datasets. Under partial-lead configurations, it yields an average 16% improvement in zero-shot AUC and, for the first time, enables cross-lead zero-shot generalization without requiring full 12-lead inputs.
📝 Abstract
Recent advances in multimodal ECG representation learning center on aligning ECG signals with paired free-text reports. However, suboptimal alignment persists due to the complexity of medical language and the reliance on a full 12-lead setup, which is often unavailable in under-resourced settings. To tackle these issues, we propose **K-MERL**, a knowledge-enhanced multimodal ECG representation learning framework. **K-MERL** leverages large language models to extract structured knowledge from free-text reports and employs a lead-aware ECG encoder with dynamic lead masking to accommodate arbitrary lead inputs. Evaluations on six external ECG datasets show that **K-MERL** achieves state-of-the-art performance in zero-shot classification and linear probing tasks, while delivering an average **16%** AUC improvement over existing methods in partial-lead zero-shot classification.