VisioPhysioENet: Multimodal Engagement Detection using Visual and Physiological Signals

📅 2024-09-24

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This study addresses the challenge of fine-grained student attention recognition in classroom settings. We propose VisioPhysioENet, the first dual-level vision–physiology collaborative modeling framework: the vision module extracts dynamic facial keypoint and region-of-interest (ROI) features using Dlib and OpenCV; the physiology module innovatively employs a skin-plane orthogonal method for contactless cardiovascular activity extraction. Evaluated on the DAiSEE dataset, VisioPhysioENet achieves 63.09% accuracy in multi-level attention classification—outperforming the sole existing multimodal baseline by 8.6 percentage points and significantly surpassing state-of-the-art unimodal approaches. This work provides the first empirical validation that cross-modal physiological signals enhance multi-level attention discrimination, establishing a novel paradigm for unobtrusive, fine-grained learning state assessment.

Technology Category

Application Category

📝 Abstract

This paper presents VisioPhysioENet, a novel multimodal system that leverages visual and physiological signals to detect learner engagement. It employs a two-level approach for extracting both visual and physiological features. For visual feature extraction, Dlib is used to detect facial landmarks, while OpenCV provides additional estimations. The face recognition library, built on Dlib, is used to identify the facial region of interest specifically for physiological signal extraction. Physiological signals are then extracted using the plane-orthogonal-toskin method to assess cardiovascular activity. These features are integrated using advanced machine learning classifiers, enhancing the detection of various levels of engagement. We thoroughly tested VisioPhysioENet on the DAiSEE dataset. It achieved an accuracy of 63.09%. This shows it can better identify different levels of engagement compared to many existing methods. It performed 8.6% better than the only other model that uses both physiological and visual features.

Problem

Research questions and friction points this paper is trying to address.

Student Engagement

Class Participation

Attention Level

Innovation

Methods, ideas, or system contributions that make the work stand out.

VisioPhysioENet

Multimodal Engagement Assessment

Machine Learning Integration

🔎 Similar Papers

Empathy Detection from Text, Audiovisual, Audio or Physiological Signals: A Systematic Review of Task Formulations and Machine Learning Methods