🤖 AI Summary
Existing ECG–text contrastive learning methods struggle to model fine-grained waveform features and diagnostic reasoning due to the absence of explicit waveform descriptions in clinical reports. To address this, we propose the first contrastive learning framework explicitly designed for ECG waveform feature completion and semantic alignment: (1) leveraging large language models (LLMs) to invert and reconstruct missing waveform semantics in reports; (2) constructing a waveform–text semantic similarity matrix to guide fine-grained contrastive learning; and (3) introducing a sigmoid-based multi-label loss function tailored to weakly supervised, multi-sign, multi-diagnosis scenarios. Evaluated on six benchmark datasets, our method achieves state-of-the-art zero-shot transfer and linear probe performance—marking the first work to enable waveform-level semantic alignment for interpretable, representation learning in ECG diagnosis.
📝 Abstract
Electrocardiograms (ECGs) are essential for diagnosing cardiovascular diseases. While previous ECG-text contrastive learning methods have shown promising results, they often overlook the incompleteness of the reports. Given an ECG, the report is generated by first identifying key waveform features and then inferring the final diagnosis through these features. Despite their importance, these waveform features are often not recorded in the report as intermediate results. Aligning ECGs with such incomplete reports impedes the model's ability to capture the ECG's waveform features and limits its understanding of diagnostic reasoning based on those features. To address this, we propose FG-CLEP (Fine-Grained Contrastive Language ECG Pre-training), which aims to recover these waveform features from incomplete reports with the help of large language models (LLMs), under the challenges of hallucinations and the non-bijective relationship between waveform features and diagnoses. Additionally, considering the frequent false negatives due to the prevalence of common diagnoses in ECGs, we introduce a semantic similarity matrix to guide contrastive learning. Furthermore, we adopt a sigmoid-based loss function to accommodate the multi-label nature of ECG-related tasks. Experiments on six datasets demonstrate that FG-CLEP outperforms state-of-the-art methods in both zero-shot prediction and linear probing across these datasets.