🤖 AI Summary
This study addresses the reliability and interpretability of tactile–auditory bimodal emotional expression in human–robot interaction. To this end, we synchronously recorded spontaneous tactile gestures and accompanying vocalizations using piezoresistive pressure sensors and a microphone array, constructing a cross-subject multimodal dataset comprising 10 emotion categories. We first empirically demonstrate significant inter-subject consistency in tactile emotional expression—a novel finding. We then propose a tactile–auditory fusion feature modeling approach, revealing that emotion confusion is predominantly governed by similarity in arousal and valence dimensions. Using SVM classification, the average accuracy across all 10 emotions reaches 40%, with “Attention” achieving 87.65%. Our core contributions are threefold: (1) establishing the cross-subject robustness of tactile affective expression; (2) introducing the first tactile–auditory collaborative framework for emotion recognition; and (3) elucidating the cognitive origins—namely, low discriminability arising from overlapping affective dimensions—underlying poorly differentiated emotions.
📝 Abstract
Human emotions can be conveyed through nuanced touch gestures. However, there is a lack of understanding of how consistently emotions can be conveyed to robots through touch. This study explores the consistency of touch-based emotional expression toward a robot by integrating tactile and auditory sensory reading of affective haptic expressions. We developed a piezoresistive pressure sensor and used a microphone to mimic touch and sound channels, respectively. In a study with 28 participants, each conveyed 10 emotions to a robot using spontaneous touch gestures. Our findings reveal a statistically significant consistency in emotion expression among participants. However, some emotions obtained low intraclass correlation values. Additionally, certain emotions with similar levels of arousal or valence did not exhibit significant differences in the way they were conveyed. We subsequently constructed a multi-modal integrating touch and audio features to decode the 10 emotions. A support vector machine (SVM) model demonstrated the highest accuracy, achieving 40% for 10 classes, with"Attention"being the most accurately conveyed emotion at a balanced accuracy of 87.65%.