🤖 AI Summary
This study addresses tactile gesture recognition in industrial human–robot collaboration, proposing a novel paradigm that relies solely on built-in joint sensors—eliminating the need for external tactile hardware. Methodologically, joint torque and position signals are transformed into time-frequency spectrograms via short-time Fourier transform (STFT), then modeled using both 2D-CNN (STFT2DCNN) and 3D-CNN (STFT3DCNN) architectures to capture spatiotemporal dynamics. The key contribution is the first systematic validation that joint-sensor-derived spectral representations effectively support both contact detection and fine-grained gesture classification, with strong cross-pose generalization. Experimental evaluation on the Franka Emika robotic platform achieves >95% accuracy for both contact detection and gesture classification, demonstrating the feasibility of low-cost, highly scalable, proprioceptive-only tactile perception.
📝 Abstract
While gesture recognition using vision or robot skins is an active research area in Human-Robot Collaboration (HRC), this paper explores deep learning methods relying solely on a robot's built-in joint sensors, eliminating the need for external sensors. We evaluated various convolutional neural network (CNN) architectures and collected two datasets to study the impact of data representation and model architecture on the recognition accuracy. Our results show that spectrogram-based representations significantly improve accuracy, while model architecture plays a smaller role. We also tested generalization to new robot poses, where spectrogram-based models performed better. Implemented on a Franka Emika Research robot, two of our methods, STFT2DCNN and STT3DCNN, achieved over 95% accuracy in contact detection and gesture classification. These findings demonstrate the feasibility of external-sensor-free tactile recognition and promote further research toward cost-effective, scalable solutions for HRC.