Exploiting a Mixture-of-Layers in an Electrocardiography Foundation Model

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

194K/year
🤖 AI Summary
Existing ECG foundation models typically utilize only the final-layer representations from Vision Transformers (ViTs), leading to suboptimal exploitation of hierarchical feature information. Method: We propose Post-pretraining Mixture-of-layers Aggregation (PMA), a novel architecture that employs a learnable gating network to dynamically weight and fuse hidden representations across all ViT layers; additionally, we introduce a grouped mean aggregation strategy during post-pretraining to enhance inter-layer diversity modeling. Contribution/Results: PMA is the first method to systematically uncover and synergistically leverage the complementary nature of multi-layer representations in ECG Transformers, thereby transcending the conventional single-layer representation paradigm. Extensive experiments across diverse downstream tasks—including arrhythmia classification and abnormality detection—demonstrate that PMA consistently outperforms strong baselines, validating the effectiveness, robustness, and generalizability of adaptive multi-layer representation fusion for ECG analysis.

Technology Category

Application Category

📝 Abstract
Transformer-based foundation models for Electrocardiograms (ECGs) have recently achieved impressive performance in many downstream applications. However, the internal representations of such models across layers have not been fully understood and exploited. An important question arises: Does the final layer of the pre-trained Transformer model, the emph{de facto} representational layer, provide optimal performance for downstream tasks? Although our answer based on empirical and theoretical analyses for this question is negative, we propose a novel approach to leverage the representation diversity of the model's layers effectively. Specifically, we introduce a novel architecture called Post-pretraining Mixture-of-layers Aggregation (PMA), which enables a flexible combination of the layer-wise representations from the layer stack of a Transformer-based foundation model. We first pre-train the model from ECG signals using the 1-dimensional Vision Transformer (ViT) via masked modeling. In downstream applications, instead of relying solely on the last layer of the model, we employ a gating network to selectively fuse the representations from the pretrained model's layers, thereby enhancing representation power and improving performance of the downstream applications. In addition, we extend the proposed method to the pretraining stage by aggregating all representations through group-wise averaging before feeding them into the decoder-based Transformer.
Problem

Research questions and friction points this paper is trying to address.

Optimizing ECG foundation model layer representations for downstream tasks
Leveraging Transformer layer diversity beyond final layer output
Enhancing ECG analysis via selective layer fusion architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-layers aggregation architecture
Gating network for layer representation fusion
Group-wise averaging in pretraining stage
🔎 Similar Papers