Exploring Best Practices for ECG Pre-Processing in Machine Learning

📅 2023-11-02
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of consensus on preprocessing strategies for multi-label ECG-based cardiac disease classification. We systematically evaluate the impact of downsampling, normalization, and bandpass filtering on three state-of-the-art time-series classifiers—InceptionTime, TS-Transformer, and ROCKET—across three benchmark ECG datasets. Results show that 50 Hz sampling achieves performance comparable to 500 Hz, reducing model parameter count and training time by ~90%; min-max normalization slightly degrades accuracy, while IIR/FIR bandpass filtering yields no significant improvement; Z-score normalization demonstrates robustness. Crucially, we empirically refute both the “preprocessing-irrelevance” and “blind-preprocessing-effectiveness” hypotheses, providing the first evidence that preprocessing must be task-adaptive: sensitivity varies significantly across cardiac disease subtypes and model architectures. Our findings establish a reproducible, lightweight, and task-driven preprocessing paradigm for ECG-based intelligent diagnosis.
📝 Abstract
In this work we search for best practices in pre-processing of Electrocardiogram (ECG) signals in order to train better classifiers for the diagnosis of heart conditions. State of the art machine learning algorithms have achieved remarkable results in classification of some heart conditions using ECG data, yet there appears to be no consensus on pre-processing best practices. Is this lack of consensus due to different conditions and architectures requiring different processing steps for optimal performance? Is it possible that state of the art deep-learning models have rendered pre-processing unnecessary? In this work we apply down-sampling, normalization, and filtering functions to 3 different multi-label ECG datasets and measure their effects on 3 different high-performing time-series classifiers. We find that sampling rates as low as 50Hz can yield comparable results to the commonly used 500Hz. This is significant as smaller sampling rates will result in smaller datasets and models, which require less time and resources to train. Additionally, despite their common usage, we found min-max normalization to be slightly detrimental overall, and band-passing to make no measurable difference. We found the blind approach to pre-processing of ECGs for multi-label classification to be ineffective, with the exception of sample rate reduction which reliably reduces computational resources, but does not increase accuracy.
Problem

Research questions and friction points this paper is trying to address.

Identifying optimal ECG pre-processing steps for machine learning
Evaluating impact of sampling rates on classifier performance
Assessing necessity of normalization and filtering in ECG analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Down-sampling ECG to 50Hz reduces computational resources
Min-max normalization slightly harms classification performance
Band-pass filtering shows no measurable improvement
🔎 Similar Papers
No similar papers found.
A
Amir Salimi
University of Alberta, Computing Science, Edmonton, Alberta, Canada
S
S. Kalmady
University of Alberta, Canadian Vigour Centre, Edmonton, Alberta, Canada
A
A. Hindle
University of Alberta, Computing Science, Edmonton, Alberta, Canada
Osmar Zaiane
Osmar Zaiane
University of Alberta - Alberta Machine Intelligence Institute, Canada CIFAR AI Chair
Data MiningSocial Network AnalysisHealth InformaticsData PrivacyBig Data Analytics
P
Padma Kaul
University of Alberta, Canadian Vigour Centre, Edmonton, Alberta, Canada