Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation

πŸ“… 2025-06-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of learning unified, robust, and generalizable multimodal tactile representations to enhance task performance and physical property reasoning in robotic dexterous manipulation. To this end, we propose Sparsh-Xβ€”the first self-supervised representation model that jointly encodes image, audio, motion, and pressure tactile signals, trained on a million-scale real-world contact dataset to capture complementary spatiotemporal features. Key contributions include: (1) the first unified representation framework for multimodal tactile signals; (2) a novel tactile pretraining paradigm explicitly designed for physical property perception; and (3) a joint tactile-action representation learning scheme coupled with a real-to-simulation transfer adaptation framework. Evaluated on standard benchmarks, Sparsh-X achieves a 63% improvement in policy success rate, a 90% gain in object state recovery robustness, and a 48% increase in physical property classification accuracy.

Technology Category

Application Category

πŸ“ Abstract
We present Sparsh-X, the first multisensory touch representations across four tactile modalities: image, audio, motion, and pressure. Trained on ~1M contact-rich interactions collected with the Digit 360 sensor, Sparsh-X captures complementary touch signals at diverse temporal and spatial scales. By leveraging self-supervised learning, Sparsh-X fuses these modalities into a unified representation that captures physical properties useful for robot manipulation tasks. We study how to effectively integrate real-world touch representations for both imitation learning and tactile adaptation of sim-trained policies, showing that Sparsh-X boosts policy success rates by 63% over an end-to-end model using tactile images and improves robustness by 90% in recovering object states from touch. Finally, we benchmark Sparsh-X ability to make inferences about physical properties, such as object-action identification, material-quantity estimation, and force estimation. Sparsh-X improves accuracy in characterizing physical properties by 48% compared to end-to-end approaches, demonstrating the advantages of multisensory pretraining for capturing features essential for dexterous manipulation.
Problem

Research questions and friction points this paper is trying to address.

Develops multisensory touch representations for robot manipulation
Improves policy success rates and robustness in tactile adaptation
Enhances accuracy in inferring physical properties for dexterous manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multisensory touch representations across four modalities
Self-supervised learning for unified tactile representation
Boosts policy success rates by 63%
πŸ”Ž Similar Papers
No similar papers found.