🤖 AI Summary
Current dexterous robotic hands suffer from incomplete tactile sensing across the entire hand (fingertips, phalanges, palm), poor interpretability of magnetic skin sensors, cross-device calibration difficulties, and heavy reliance on labeled data. To address these challenges, this paper introduces Sparsh-skin: the first self-supervised pre-trained encoder specifically designed for full-hand magnetic tactile skins. Its core innovation lies in the first-ever self-supervised, self-distillation pre-training framework for magnetic tactile sequences, jointly modeling kinematic and magnetic flux temporal dynamics—thereby overcoming the inherent low physical interpretability of magnetic signals and hardware heterogeneity constraints. The model enables zero-shot transfer to downstream tasks including state estimation and policy learning. In benchmark evaluations, it achieves over 41% improvement in sample efficiency and outperforms existing methods by more than 56% in performance.
📝 Abstract
We present Sparsh-skin, a pre-trained encoder for magnetic skin sensors distributed across the fingertips, phalanges, and palm of a dexterous robot hand. Magnetic tactile skins offer a flexible form factor for hand-wide coverage with fast response times, in contrast to vision-based tactile sensors that are restricted to the fingertips and limited by bandwidth. Full hand tactile perception is crucial for robot dexterity. However, a lack of general-purpose models, challenges with interpreting magnetic flux and calibration have limited the adoption of these sensors. Sparsh-skin, given a history of kinematic and tactile sensing across a hand, outputs a latent tactile embedding that can be used in any downstream task. The encoder is self-supervised via self-distillation on a variety of unlabeled hand-object interactions using an Allegro hand sensorized with Xela uSkin. In experiments across several benchmark tasks, from state estimation to policy learning, we find that pretrained Sparsh-skin representations are both sample efficient in learning downstream tasks and improve task performance by over 41% compared to prior work and over 56% compared to end-to-end learning.