🤖 AI Summary
Prior research has not systematically investigated how robots can convey emotions and social touch through coordinated tactile–auditory multimodal interaction.
Method: We developed a multimodal interface integrating a 25-element vibrotactile array with synchronized audio, and conducted psychology-informed experiments to assess human recognition of compound emotional stimuli using arousal and valence ratings.
Contribution/Results: This study provides the first empirical evidence of complementary effects between tactile and auditory modalities in emotion communication: each modality exhibits emotion-specific strengths—tactile signals better encode intensity, while auditory signals more effectively discriminate valence—whereas social gestures alone lack sufficient emotional specificity. Multimodal fusion significantly enhances emotion decoding accuracy (mean improvement +23.6%). These findings confirm that cross-modal sensory integration is a critical mechanism for improving affective human–robot interaction, offering both theoretical foundations and practical design guidelines for embodied affective interfaces.
📝 Abstract
Affective tactile interaction constitutes a fundamental component of human communication. In natural human-human encounters, touch is seldom experienced in isolation; rather, it is inherently multisensory. Individuals not only perceive the physical sensation of touch but also register the accompanying auditory cues generated through contact. The integration of haptic and auditory information forms a rich and nuanced channel for emotional expression. While extensive research has examined how robots convey emotions through facial expressions and speech, their capacity to communicate social gestures and emotions via touch remains largely underexplored. To address this gap, we developed a multimodal interaction system incorporating a 5*5 grid of 25 vibration motors synchronized with audio playback, enabling robots to deliver combined haptic-audio stimuli. In an experiment involving 32 Chinese participants, ten emotions and six social gestures were presented through vibration, sound, or their combination. Participants rated each stimulus on arousal and valence scales. The results revealed that (1) the combined haptic-audio modality significantly enhanced decoding accuracy compared to single modalities; (2) each individual channel-vibration or sound-effectively supported certain emotions recognition, with distinct advantages depending on the emotional expression; and (3) gestures alone were generally insufficient for conveying clearly distinguishable emotions. These findings underscore the importance of multisensory integration in affective human-robot interaction and highlight the complementary roles of haptic and auditory cues in enhancing emotional communication.