EmpathicSchool: A multimodal dataset for real-time facial expressions and physiological data analysis under different stress conditions

📅 2022-08-29
🏛️ arXiv.org
📈 Citations: 9
Influential: 2
📄 PDF
🤖 AI Summary
Existing affective computing research predominantly relies on single-modality data, lacking high-synchrony, multi-turn multimodal benchmark datasets for stress response analysis. Method: We introduce the first synchronized multimodal stress dataset, concurrently capturing facial video and nine physiological signals—including heart rate, electrodermal activity, and skin temperature—using a high-frame-rate camera and medical-grade wearable sensors (Empatica E4). Data were collected from 20 participants across 26 hours of ecologically valid stress-inducing scenarios, with rigorous temporal alignment, artifact correction, and multi-source signal-to-noise ratio validation to ensure high fidelity and cross-session consistency. Contribution/Results: A ResNet+LSTM fusion model trained on this dataset achieves 89.3% accuracy in stress-level classification, significantly outperforming unimodal baselines. This work establishes the first high-quality, cross-modal benchmark for stress recognition, addressing a critical gap in affective computing and multimodal behavioral physiology.
📝 Abstract
Affective computing has garnered researchers' attention and interest in recent years as there is a need for AI systems to better understand and react to human emotions. However, analyzing human emotions, such as mood or stress, is quite complex. While various stress studies use facial expressions and wearables, most existing datasets rely on processing data from a single modality. This paper presents EmpathicSchool, a novel dataset that captures facial expressions and the associated physiological signals, such as heart rate, electrodermal activity, and skin temperature, under different stress levels. The data was collected from 20 participants at different sessions for 26 hours. The data includes nine different signal types, including both computer vision and physiological features that can be used to detect stress. In addition, various experiments were conducted to validate the signal quality.
Problem

Research questions and friction points this paper is trying to address.

Detecting human stress using multimodal data analysis
Addressing limitations of single-modality stress detection datasets
Validating facial expressions and physiological signals for stress recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal dataset combining facial expressions and physiological signals
Captures seven signal types including computer vision features
Validated through experiments assessing signal quality