🤖 AI Summary
This study addresses the lack of systematic characterization of multimodal affective computing datasets for continuous valence–arousal annotation. We conduct a comprehensive survey of 25 such datasets published between 2008 and 2024, analyzing their scale, participant demographics, sensor modalities (e.g., EEG, ECG, facial video, speech), annotation protocols, and data formats. Through cross-dataset comparative analysis and methodological evaluation, we chart the technical evolution and application distribution of these resources for the first time. Our findings reveal a dominant trend toward camera-centric acquisition coupled with synergistic multimodal fusion, and quantitatively demonstrate the performance gains achievable through integrated physiological–behavioral signal fusion. The study delivers an authoritative, empirically grounded methodology guide for dataset selection, model design, and real-world deployment of affective computing systems—particularly in human–computer interaction, mental health monitoring, and autonomous driving applications.
📝 Abstract
Understanding human affect can be used in robotics, marketing, education, human-computer interaction, healthcare, entertainment, autonomous driving, and psychology to enhance decision-making, personalize experiences, and improve emotional well-being. This work presents a comprehensive overview of affect inference datasets that utilize continuous valence and arousal labels. We reviewed 25 datasets published between 2008 and 2024, examining key factors such as dataset size, subject distribution, sensor configurations, annotation scales, and data formats for valence and arousal values. While camera-based datasets dominate the field, we also identified several widely used multimodal combinations. Additionally, we explored the most common approaches to affect detection applied to these datasets, providing insights into the prevailing methodologies in the field. Our overview of sensor fusion approaches shows promising advancements in model improvement for valence and arousal inference.