Sound-Based Recognition of Touch Gestures and Emotions for Enhanced Human-Robot Interaction

📅 2024-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the absence of tactile and affective perception in privacy-sensitive human-robot interaction scenarios, this paper proposes the first audio-only paradigm for tactile interaction understanding: recognizing touch gestures and estimating arousal–valence emotions solely from contact-induced acoustic signals—without visual or tactile sensors. Our method employs a lightweight CNN-LSTM hybrid architecture that jointly extracts time-frequency features and optimizes gesture classification (6 classes) and continuous emotion regression via multi-task learning. Evaluated on a newly collected dataset of touch-acoustic recordings from 28 participants interacting with a Pepper robot, the model achieves accuracy comparable to state-of-the-art PANNs, while reducing computational cost by over 90%. With only 0.24M parameters (0.94 MB storage, 0.7G FLOPs), it supports variable-length inputs and exhibits low inference latency—enabling GDPR-compliant, sensor-free affective HRI.

Technology Category

Application Category

📝 Abstract
Emotion recognition and touch gesture decoding are crucial for advancing human-robot interaction (HRI), especially in social environments where emotional cues and tactile perception play important roles. However, many humanoid robots, such as Pepper, Nao, and Furhat, lack full-body tactile skin, limiting their ability to engage in touch-based emotional and gesture interactions. In addition, vision-based emotion recognition methods usually face strict GDPR compliance challenges due to the need to collect personal facial data. To address these limitations and avoid privacy issues, this paper studies the potential of using the sounds produced by touching during HRI to recognise tactile gestures and classify emotions along the arousal and valence dimensions. Using a dataset of tactile gestures and emotional interactions from 28 participants with the humanoid robot Pepper, we design an audio-only lightweight touch gesture and emotion recognition model with only 0.24M parameters, 0.94MB model size, and 0.7G FLOPs. Experimental results show that the proposed sound-based touch gesture and emotion recognition model effectively recognises the arousal and valence states of different emotions, as well as various tactile gestures, when the input audio length varies. The proposed model is low-latency and achieves similar results as well-known pretrained audio neural networks (PANNs), but with much smaller FLOPs, parameters, and model size.
Problem

Research questions and friction points this paper is trying to address.

Tactile Sound Recognition
Emotion Perception
Privacy Preservation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sound Recognition
Emotion and Touch Sensing
Lightweight Model
🔎 Similar Papers
No similar papers found.