DIPSER: A Dataset for In-Person Student1 Engagement Recognition in the Wild

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of accurately recognizing students’ attention and affective states in authentic classroom settings. To this end, we introduce the first multimodal dataset for student attentiveness recognition specifically designed for in-person instruction. Our methodology uniquely integrates synchronized multimodal data—including multi-view RGB video from environmental cameras, close-up facial videos, and physiological signals (e.g., PPG, EDA) captured via smartwatches—across diverse ethnic groups. Annotation employs a dual-verification strategy: consensus labeling by four domain experts combined with student self-reports, supporting a fine-grained joint attentiveness–affect labeling scheme. Technically, the pipeline incorporates multi-view visual representation learning, temporal modeling of physiological time series, and high-precision cross-modal temporal alignment. This dataset constitutes the most comprehensive benchmark to date for classroom engagement analysis, bridging a critical gap in fine-grained behavioral understanding within real-world educational contexts and enabling joint, interpretable modeling of attention and emotion.

Technology Category

Application Category

📝 Abstract
In this paper, a novel dataset is introduced, designed to assess student attention within in-person classroom settings. This dataset encompasses RGB camera data, featuring multiple cameras per student to capture both posture and facial expressions, in addition to smartwatch sensor data for each individual. This dataset allows machine learning algorithms to be trained to predict attention and correlate it with emotion. A comprehensive suite of attention and emotion labels for each student is provided, generated through self-reporting as well as evaluations by four different experts. Our dataset uniquely combines facial and environmental camera data, smartwatch metrics, and includes underrepresented ethnicities in similar datasets, all within in-the-wild, in-person settings, making it the most comprehensive dataset of its kind currently available. The dataset presented offers an extensive and diverse collection of data pertaining to student interactions across different educational contexts, augmented with additional metadata from other tools. This initiative addresses existing deficiencies by offering a valuable resource for the analysis of student attention and emotion in face-to-face lessons.
Problem

Research questions and friction points this paper is trying to address.

Recognize student engagement in classrooms
Combine camera and smartwatch data for analysis
Address dataset gaps with diverse ethnicity inclusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines RGB and smartwatch data
Includes diverse ethnic representation
Utilizes multi-expert labeled dataset
🔎 Similar Papers
No similar papers found.
L
Luis Marquez-Carpintero
Institute for Computer Research, P.O. Box 99. 03080, Alicante, Spain
Sergio Suescun-Ferrandiz
Sergio Suescun-Ferrandiz
Investigador del grupo RoViT de la Universidad de Alicante
Inteligencia Artificial
C
Carolina Lorenzo 'Alvarez
Faculty of Education, University of Alicante, 03690 Alicante, Spain
J
Jorge Fernández-Herrero
Faculty of Education, University of Alicante, 03690 Alicante, Spain
D
D. Viejo
Institute for Computer Research, P.O. Box 99. 03080, Alicante, Spain
R
Rosabel Roig-Vila
Faculty of Education, University of Alicante, 03690 Alicante, Spain
M
Miguel Cazorla
Institute for Computer Research, P.O. Box 99. 03080, Alicante, Spain