Deriving Health Metrics from the Photoplethysmogram: Benchmarks and Insights from MIMIC-III-Ext-PPG

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of large-scale, high-quality clinical benchmarks for photoplethysmography (PPG)-based algorithms, which has hindered fair model comparison. We present the first unified PPG benchmark for multitask clinical prediction, encompassing arrhythmia classification—including the first systematic evaluation beyond atrial fibrillation and flutter—as well as regression tasks for respiratory rate, heart rate, and blood pressure. Leveraging the MIMIC-III-Ext-PPG dataset, we employ established deep learning architectures and conduct cross-dataset validation. Results demonstrate strong performance: atrial fibrillation detection achieves an AUROC of 0.96 (0.97 cross-dataset), while physiological parameter estimation yields low errors (RR MAE: 2.97 bpm; HR MAE: 1.13 bpm; SBP/DBP MAE: 16.13/8.70 mmHg). Further analysis reveals that performance disparities stem from population-specific waveform characteristics rather than model bias.

Technology Category

Application Category

📝 Abstract
Photoplethysmography (PPG) is one of the most widely captured biosignals for clinical prediction tasks, yet PPG-based algorithms are typically trained on small-scale datasets of uncertain quality, which hinders meaningful algorithm comparisons. We present a comprehensive benchmark for PPG-based clinical prediction using the \dbname~dataset, establishing baselines across the full spectrum of clinically relevant applications: multi-class heart rhythm classification, and regression of physiological parameters including respiratory rate (RR), heart rate (HR), and blood pressure (BP). Most notably, we provide the first comprehensive assessment of PPG for general arrhythmia detection beyond atrial fibrillation (AF) and atrial flutter (AFLT), with performance stratified by BP, HR, and demographic subgroups. Using established deep learning architectures, we achieved strong performance for AF detection (AUROC = 0.96) and accurate physiological parameter estimation (RR MAE: 2.97 bpm; HR MAE: 1.13 bpm; SBP/DBP MAE: 16.13/8.70 mmHg). Cross-dataset validation demonstrates excellent generalizability for AF detection (AUROC = 0.97), while clinical subgroup analysis reveals marked performance differences across subgroups by BP, HR, and demographic strata. These variations appear to reflect population-specific waveform differences rather than systematic bias in model behavior. This framework establishes the first integrated benchmark for multi-task PPG-based clinical prediction, demonstrating that PPG signals can effectively support multiple simultaneous monitoring tasks and providing essential baselines for future algorithm development.
Problem

Research questions and friction points this paper is trying to address.

Photoplethysmography
clinical prediction
arrhythmia detection
physiological parameter estimation
algorithm benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

photoplethysmography
clinical benchmark
arrhythmia detection
physiological parameter estimation
deep learning
🔎 Similar Papers
No similar papers found.
Mohammad Moulaeifard
Mohammad Moulaeifard
ML Engineer / Researcher
P
Philip J. Aston
Department of Data Science and AI, National Physical Laboratory, Teddington, United Kingdom; School of Mathematics and Physics, University of Surrey, Guildford, United Kingdom
P
Peter H. Charlton
Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
Nils Strodthoff
Nils Strodthoff
Professor for eHealth/AI4Health, Oldenburg University, Germany
Machine LearningDeep LearningBiomedical Data Analysis