🤖 AI Summary
The scaling laws of electrocardiogram (ECG) models remain unclear, as increased model size does not necessarily yield better performance. This study systematically investigates how neural architecture (ResNet versus Transformer) and pretraining paradigm (supervised versus self-supervised) influence scaling behavior, training 120 models of varying scales on the CODE dataset. It reveals, for the first time, that self-supervised learning achieves substantially higher data efficiency (up to 16×) and transfer efficiency (up to 7.6×) for out-of-distribution generalization. ResNet demonstrates superior parameter efficiency over Transformer (1.3–2.5×) and, when combined with self-supervised pretraining, consistently attains the lowest out-of-distribution loss across most scales. These findings indicate that developing effective ECG foundation models requires co-design of architecture and pretraining strategy, rather than merely scaling up model size.
📝 Abstract
While scaling laws have established a fundamental framework for foundation models in natural language processing, their applicability to electrocardiogram (ECG) models remains poorly characterized. Indeed, recent studies do not always yield consistent downstream gains as one increases the model size or pre-training dataset size of ECG models, leaving the exact roles of architectural inductive biases, pre-training paradigms, and expected improvements with size largely unanswered. In this work, we systematically investigate neural and loss-to-loss scaling laws within the ECG domain. By pre-training over $120$ models (ranging from $20$K to $200$M parameters) on the large-scale CODE dataset ($2.3$M records), we decouple the effects of model architecture (ResNet vs. Transformer) and pre-training paradigm, namely supervised learning (SL) versus self-supervised learning (SSL). We found that (i) SL models are data-bottlenecked in-distribution, whereas SSL models scale robustly across both model and data sizes; (ii) for out-of-distribution (OOD) generalization, ResNets are $1.3$ to $2.5$ times more parameter-efficient than Transformers, while SSL is up to $16$ times more data-efficient and achieves up to $7.6$ times higher transfer efficiency than SL on unseen clinical tasks; (iii) across the observed scales, ResNet-based models generally achieve the lowest OOD loss, with SSL dominating on unseen clinical tasks and self-supervised Transformers overtaking at very large model sizes. Our results suggest that the path to effective ECG foundation models lies in the strategic alignment of architecture and paradigm rather than brute-force scaling.