π€ AI Summary
This work addresses the challenges of scarce labeled electrocardiogram (ECG) data and the high cost of expert annotations by proposing CoRe-ECG, a self-supervised pretraining framework that jointly models global semantics and local structure through collaborative contrastive and reconstruction learning. The method introduces two key innovations: Frequency-Dynamic Augmentation (FDA) and Spatio-Temporal Dual Masking (STDM), which effectively mitigate the modelβs reliance on spurious linear shortcuts across ECG leads. Evaluated on multiple downstream ECG tasks, CoRe-ECG consistently outperforms existing approaches, and ablation studies confirm the necessity and complementarity of its core components.
π Abstract
Accurate interpretation of electrocardiogram (ECG) remains challenging due to the scarcity of labeled data and the high cost of expert annotation. Self-supervised learning (SSL) offers a promising solution by enabling models to learn expressive representations from unlabeled signals. Existing ECG SSL methods typically rely on either contrastive learning or reconstructive learning. However, each approach in isolation provides limited supervisory signals and suffers from additional limitations, including non-physiological distortions introduced by naive augmentations and trivial correlations across multiple leads that models may exploit as shortcuts. In this work, we propose CoRe-ECG, a unified contrastive and reconstructive pretraining paradigm that establishes a synergistic interaction between global semantic modeling and local structural learning. CoRe-ECG aligns global representations during reconstruction, enabling instance-level discriminative signals to guide local waveform recovery. To further enhance pretraining, we introduce Frequency Dynamic Augmentation (FDA) to adaptively perturb ECG signals based on their frequency-domain importance, and Spatio-Temporal Dual Masking (STDM) to break linear dependencies across leads, increasing the difficulty of reconstructive tasks. Our method achieves state-of-the-art performance across multiple downstream ECG datasets. Ablation studies further demonstrate the necessity and complementarity of each component. This approach provides a robust and physiologically meaningful representation learning framework for ECG analysis.