🤖 AI Summary
Current photoplethysmography (PPG)-based cuffless blood pressure (BP) estimation models suffer from poor cross-dataset generalizability due to distribution shifts—particularly in BP distributions—between source and target domains. Method: We systematically evaluate five deep learning architectures (e.g., XResNet1d101) on PulseDB (source) and multiple external datasets (targets), establishing the first benchmark for cross-center generalization of PPG-based BP models. We further propose a lightweight, sample-level domain adaptation method that operates without target-domain labels. Contribution/Results: Under source-domain calibration, mean absolute errors (MAEs) for systolic/diastolic BP (SBP/DBP) are 9.4/6.0 mmHg; without calibration, errors degrade substantially to 15.0–25.1/7.0–10.4 mmHg across external datasets. Our method significantly improves cross-domain robustness, reducing performance degradation and enabling more reliable, reproducible clinical deployment of PPG-based BP estimation.
📝 Abstract
Photoplethysmography (PPG)-based blood pressure (BP) estimation represents a promising alternative to cuff-based BP measurements. Recently, an increasing number of deep learning models have been proposed to infer BP from the raw PPG waveform. However, these models have been predominantly evaluated on in-distribution test sets, which immediately raises the question of the generalizability of these models to external datasets. To investigate this question, we trained five deep learning models on the recently released PulseDB dataset, provided in-distribution benchmarking results on this dataset, and then assessed out-of-distribution performance on several external datasets. The best model (XResNet1d101) achieved in-distribution MAEs of 9.4 and 6.0 mmHg for systolic and diastolic BP respectively on PulseDB (with subject-specific calibration), and 14.0 and 8.5 mmHg respectively without calibration. Equivalent MAEs on external test datasets without calibration ranged from 15.0 to 25.1 mmHg (SBP) and 7.0 to 10.4 mmHg (DBP). Our results indicate that the performance is strongly influenced by the differences in BP distributions between datasets. We investigated a simple way of improving performance through sample-based domain adaptation and put forward recommendations for training models with good generalization properties. With this work, we hope to educate more researchers for the importance and challenges of out-of-distribution generalization.