Exploiting a Mixture-of-Layers in an Electrocardiography Foundation Model

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing ECG foundation models typically utilize only the final-layer representations from Vision Transformers (ViTs), leading to suboptimal exploitation of hierarchical feature information. Method: We propose Post-pretraining Mixture-of-layers Aggregation (PMA), a novel architecture that employs a learnable gating network to dynamically weight and fuse hidden representations across all ViT layers; additionally, we introduce a grouped mean aggregation strategy during post-pretraining to enhance inter-layer diversity modeling. Contribution/Results: PMA is the first method to systematically uncover and synergistically leverage the complementary nature of multi-layer representations in ECG Transformers, thereby transcending the conventional single-layer representation paradigm. Extensive experiments across diverse downstream tasks—including arrhythmia classification and abnormality detection—demonstrate that PMA consistently outperforms strong baselines, validating the effectiveness, robustness, and generalizability of adaptive multi-layer representation fusion for ECG analysis.

Technology Category

Application Category

📝 Abstract
Transformer-based foundation models for Electrocardiograms (ECGs) have recently achieved impressive performance in many downstream applications. However, the internal representations of such models across layers have not been fully understood and exploited. An important question arises: Does the final layer of the pre-trained Transformer model, the emph{de facto} representational layer, provide optimal performance for downstream tasks? Although our answer based on empirical and theoretical analyses for this question is negative, we propose a novel approach to leverage the representation diversity of the model's layers effectively. Specifically, we introduce a novel architecture called Post-pretraining Mixture-of-layers Aggregation (PMA), which enables a flexible combination of the layer-wise representations from the layer stack of a Transformer-based foundation model. We first pre-train the model from ECG signals using the 1-dimensional Vision Transformer (ViT) via masked modeling. In downstream applications, instead of relying solely on the last layer of the model, we employ a gating network to selectively fuse the representations from the pretrained model's layers, thereby enhancing representation power and improving performance of the downstream applications. In addition, we extend the proposed method to the pretraining stage by aggregating all representations through group-wise averaging before feeding them into the decoder-based Transformer.
Problem

Research questions and friction points this paper is trying to address.

Optimizing ECG foundation model layer representations for downstream tasks
Leveraging Transformer layer diversity beyond final layer output
Enhancing ECG analysis via selective layer fusion architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-layers aggregation architecture
Gating network for layer representation fusion
Group-wise averaging in pretraining stage
🔎 Similar Papers
No similar papers found.
Phu X. Nguyen
Phu X. Nguyen
KU Leuven
Machine LearningSpeech/AudioBiomedical EngineeringWireless Networks/5GOptimization
H
Huy Phan
Meta Reality Labs, Paris 75002, France
H
Hieu Pham
VinUni-Illinois Smart Health Center, VinUniversity, Hanoi, Vietnam
C
Christos Chatzichristos
STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Department of Electrical Engineering (ESAT), KU Leuven, Leuven 3001, Belgium
B
Bert Vandenberk
Department of Cardiovascular Sciences, KU Leuven, Leuven 3001, Belgium
Maarten De Vos
Maarten De Vos
ESAT - Stadius & Department of Development and Regeneration, KU Leuven, Belgium
mobile EEGdigital biomarkerssleepAIclinical decision support