How Much Temporal Modeling is Enough? A Systematic Study of Hybrid CNN-RNN Architectures for Multi-Label ECG Classification

📅 2026-01-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of multi-label electrocardiogram (ECG) classification—namely, disease co-occurrence, class imbalance, and long-range temporal dependencies—and critically examines the necessity of recurrent architectures, which has lacked systematic validation. Through controlled experiments on the PTB-XL dataset, the authors evaluate hybrid models combining CNNs with various recurrent structures (LSTM, GRU, BiLSTM, and their stacked variants). They demonstrate for the first time that increasing recurrent depth yields diminishing returns and risks overfitting. In contrast, a lightweight CNN coupled with a single-layer BiLSTM better aligns with the intrinsic temporal dynamics of ECG signals, consistently outperforming deeper models across key metrics: Hamming loss (0.0338), macro-AUPRC (0.4715), micro-F1 (0.6979), and subset accuracy (0.5723), thereby affirming the efficacy and clinical plausibility of parsimonious temporal modeling.

Technology Category

Application Category

📝 Abstract
Accurate multi-label classification of electrocardiogram (ECG) signals remains challenging due to the coexistence of multiple cardiac conditions, pronounced class imbalance, and long-range temporal dependencies in multi-lead recordings. Although recent studies increasingly rely on deep and stacked recurrent architectures, the necessity and clinical justification of such architectural complexity have not been rigorously examined. In this work, we perform a systematic comparative evaluation of convolutional neural networks (CNNs) combined with multiple recurrent configurations, including LSTM, GRU, Bidirectional LSTM (BiLSTM), and their stacked variants, for multi-label ECG classification on the PTB-XL dataset comprising 23 diagnostic categories. The CNN component serves as a morphology-driven baseline, while recurrent layers are progressively integrated to assess their contribution to temporal modeling and generalization performance. Experimental results indicate that a CNN integrated with a single BiLSTM layer achieves the most favorable trade-off between predictive performance and model complexity. This configuration attains superior Hamming loss (0.0338), macro-AUPRC (0.4715), micro-F1 score (0.6979), and subset accuracy (0.5723) compared with deeper recurrent combinations. Although stacked recurrent models occasionally improve recall for specific rare classes, our results provide empirical evidence that increasing recurrent depth yields diminishing returns and may degrade generalization due to reduced precision and overfitting. These findings suggest that architectural alignment with the intrinsic temporal structure of ECG signals, rather than increased recurrent depth, is a key determinant of robust performance and clinically relevant deployment.
Problem

Research questions and friction points this paper is trying to address.

multi-label ECG classification
temporal dependencies
class imbalance
architectural complexity
clinical justification
Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid CNN-RNN
multi-label ECG classification
temporal modeling
BiLSTM
model complexity
🔎 Similar Papers
No similar papers found.
Alireza Jafari
Alireza Jafari
Post-doctoral researcher, Natinal Cheng Kung University
Control theoryrobotics
F
Fatemeh Jafari
Department of Computer Engineering, Faculty of Engineering and Technology, Islamic Azad University, Gorgan Branch, Gorgan, Iran