Extending Stress Detection Reproducibility to Consumer Wearable Sensors

📅 2025-05-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the reproducibility and generalizability bottlenecks in stress detection using consumer-grade wearable devices. We systematically evaluate commercial wearables—including the Garmin Forerunner 55—against research-grade benchmarks (Biopac MP160, Polar H10, Empatica E4) across device- and subject-heterogeneous settings. Using a standardized mental arithmetic stress protocol with 35 university students, we simultaneously acquire HRV and EDA signals from multiple devices and assess performance via leave-one-subject-out (LOSO) cross-validation and AUROC. To our knowledge, this is the first work to rigorously quantify reproducibility for consumer wearables in stress detection. We identify hardware–model compatibility as a critical determinant of generalization capability. Results show the Forerunner 55 achieves an LOSO AUROC of 0.961 under mental stress—comparable to Polar H10 (0.954) and Empatica E4 (0.953)—with HRV+EDA fusion significantly enhancing performance across most devices. The Biopac MP160 maintains superior signal consistency.

Technology Category

Application Category

📝 Abstract
Wearable sensors are widely used to collect physiological data and develop stress detection models. However, most studies focus on a single dataset, rarely evaluating model reproducibility across devices, populations, or study conditions. We previously assessed the reproducibility of stress detection models across multiple studies, testing models trained on one dataset against others using heart rate (with R-R interval) and electrodermal activity (EDA). In this study, we extended our stress detection reproducibility to consumer wearable sensors. We compared validated research-grade devices, to consumer wearables - Biopac MP160, Polar H10, Empatica E4, to the Garmin Forerunner 55s, assessing device-specific stress detection performance by conducting a new stress study on undergraduate students. Thirty-five students completed three standardized stress-induction tasks in a lab setting. Biopac MP160 performed the best, being consistent with our expectations of it as the gold standard, though performance varied across devices and models. Combining heart rate variability (HRV) and EDA enhanced stress prediction across most scenarios. However, Empatica E4 showed variability; while HRV and EDA improved stress detection in leave-one-subject-out (LOSO) evaluations (AUROC up to 0.953), device-specific limitations led to underperformance when tested with our pre-trained stress detection tool (AUROC 0.723), highlighting generalizability challenges related to hardware-model compatibility. Garmin Forerunner 55s demonstrated strong potential for real-world stress monitoring, achieving the best mental arithmetic stress detection performance in LOSO (AUROC up to 0.961) comparable to research-grade devices like Polar H10 (AUROC 0.954), and Empatica E4 (AUROC 0.905 with HRV-only model and AUROC 0.953 with HRV+EDA model), with the added advantage of consumer-friendly wearability for free-living contexts.
Problem

Research questions and friction points this paper is trying to address.

Assessing stress detection reproducibility across consumer wearable sensors
Comparing performance of research-grade and consumer wearables for stress monitoring
Evaluating hardware-model compatibility challenges in stress prediction generalizability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extend stress detection to consumer wearables
Combine HRV and EDA for better prediction
Compare research-grade and consumer device performance
🔎 Similar Papers
No similar papers found.
O
Ohida Binte Amin
Khoury College of Computer Sciences, Northeastern University, Boston, MA; Bouvé College of Health Sciences, Northeastern University, Boston, MA
Varun Mishra
Varun Mishra
Northeastern University
Mobile SensingmHealth
T
Tinashe M. Tapera
Khoury College of Computer Sciences, Northeastern University, Boston, MA; Bouvé College of Health Sciences, Northeastern University, Boston, MA
R
Robert Volpe
Bouvé College of Health Sciences, Northeastern University, Boston, MA
Aarti Sathyanarayana
Aarti Sathyanarayana
Harvard University
Machine LearningHealth InformaticsArtificial IntelligenceDigital BiomarkersDigital Phenotyping