Pretraining Strategies and Scaling for ECG Foundation Models: A Systematic Study

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This study addresses the lack of systematic evaluation of pretraining methodologies and data-scaling effects for electrocardiogram (ECG) foundation models. Within a unified framework, it presents the first comprehensive comparison of five self-supervised learning objectives—including contrastive predictive coding and Joint Embedding Predictive Architecture (JEPA)—combined with three dominant architectures: structured state space models (SSMs), Transformers, and CNNs, evaluated on ECG datasets up to 11 million samples. The results demonstrate consistent performance gains with increasing data scale, with contrastive predictive coding slightly outperforming JEPA. Notably, structured state space models significantly surpass both Transformers and CNNs across multiple clinical downstream tasks, highlighting the critical role of their inductive bias in enabling superior transferability.

📝 Abstract

Specialized foundation models are beginning to emerge in various medical subdomains, but pretraining methodologies and parametric scaling with the size of the pretraining dataset are rarely assessed systematically and in a like-for-like manner. This work focuses on foundation models for electrocardiography (ECG) data, one of the most widely captured physiological time series world-wide. We present a comprehensive assessment of pretraining methodologies, covering five different contrastive and non-contrastive self-supervised learning objectives for ECG foundation models, and investigate their scaling behavior with pretraining dataset sizes up to 11M input samples, exclusively from publicly available sources. Pretraining strategy has a meaningful and consistent impact on downstream performance, with contrastive predictive coding (slightly ahead of JEPA) yielding the most transferable representations across diverse clinical tasks. Scaling pretraining data continues to yield meaningful improvements up to 11M samples for most objectives. We also compare model architectures across all pretraining methodologies and find evidence for a clear superiority of structured state space models compared to transformers and CNN models. We hypothesize that the strong inductive biases of structured state space models, rather than pretraining scale alone, are the primary driver of effective ECG representation learning, with important implications for future foundation model development in this and potentially other physiological signal domains.

Problem

Research questions and friction points this paper is trying to address.

ECG foundation models

pretraining strategies

scaling laws

self-supervised learning

physiological time series

Innovation

Methods, ideas, or system contributions that make the work stand out.

ECG foundation models

self-supervised learning

structured state space models