Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal ECG representation learning methods struggle with report-signal alignment and heavily rely on complete 12-lead recordings, limiting applicability in resource-constrained settings. Method: We propose a knowledge-enhanced dynamic multimodal alignment framework featuring: (1) a novel lead-agnostic dynamic masking mechanism enabling flexible input of arbitrary lead combinations; (2) structured clinical knowledge distillation from free-text reports via large language models (LLMs), integrated into the ECG encoder; and (3) a joint optimization objective combining lead-aware contrastive learning and cross-modal alignment. Results: Our method achieves state-of-the-art zero-shot classification and linear-probe performance across six external ECG datasets. Under partial-lead configurations, it yields an average 16% improvement in zero-shot AUC and, for the first time, enables cross-lead zero-shot generalization without requiring full 12-lead inputs.

Technology Category

Application Category

📝 Abstract
Recent advances in multimodal ECG representation learning center on aligning ECG signals with paired free-text reports. However, suboptimal alignment persists due to the complexity of medical language and the reliance on a full 12-lead setup, which is often unavailable in under-resourced settings. To tackle these issues, we propose **K-MERL**, a knowledge-enhanced multimodal ECG representation learning framework. **K-MERL** leverages large language models to extract structured knowledge from free-text reports and employs a lead-aware ECG encoder with dynamic lead masking to accommodate arbitrary lead inputs. Evaluations on six external ECG datasets show that **K-MERL** achieves state-of-the-art performance in zero-shot classification and linear probing tasks, while delivering an average **16%** AUC improvement over existing methods in partial-lead zero-shot classification.
Problem

Research questions and friction points this paper is trying to address.

Enhances ECG representation learning with multimodal data
Addresses suboptimal alignment in ECG signal-text report pairs
Supports arbitrary lead inputs for ECG classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge-enhanced framework
Dynamic lead masking
Arbitrary lead inputs
🔎 Similar Papers
No similar papers found.