🤖 AI Summary
This work addresses the challenges in electrocardiogram (ECG) arrhythmia classification—namely, high signal variability, strong noise interference, scarce labeled data, and the trade-off between accuracy and efficiency—by proposing ECG-NAT, a novel two-stage self-supervised learning framework. The method first employs a masked autoencoder for generative pretraining on multi-source ECG data, followed by discriminative fine-tuning that jointly optimizes supervised contrastive and cross-entropy losses. A key innovation is the introduction of a hierarchical neighborhood attention mechanism, which efficiently captures multiscale temporal features ranging from individual heartbeat morphology to global rhythm patterns. Experimental results demonstrate that ECG-NAT achieves 88.1% accuracy on standard benchmarks using only 1% of labeled data, maintaining superior classification performance while significantly reducing computational overhead, thereby making it well-suited for real-time ECG diagnostics.
📝 Abstract
Electrocardiogram (ECG) arrhythmia classification remains challenging due to signal variability, noise, limited labeled data, and the difficulty in achieving both accuracy and efficiency in models. While self-supervised learning reduces label dependency, most methods target either global contextual features or local morphological patterns, but rarely implement hierarchical multi-scale feature extraction. ECG signals require architectures that simultaneously capture fine-grained beat-level morphology and broader rhythm-level dependencies with computational efficiency. To overcome this limitation, this paper proposes the Electrocardiogram Neighborhood Attention Transformer (ECG-NAT), a novel self-supervised learning approach tailored for multi-lead ECG classification. Our two-stage approach begins with generative pretraining, using a masked autoencoder to reconstruct partially masked ECG signals across multiple diverse datasets, enabling the model to learn robust, domain-invariant representations from unlabeled data. This is followed by discriminative fine-tuning with a dual-loss function that combines supervised contrastive and cross-entropy losses, aligning representation learning with label prediction. The hierarchical attention mechanism efficiently captures multi-scale temporal features from localized beat morphology to broader rhythm patterns at low computational cost. ECG-NAT achieves robust performance on benchmark datasets, with 88.1\% accuracy using only 1\% labeled data, demonstrating strong efficacy in low-resource settings. The framework combines superior classification performance with computational efficiency, making it practical for real-time ECG diagnosis. The code will be made available upon acceptance at: https://github.com/Mahsagazeran/ECG-NAT.