Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work addresses the challenge of learning unified, robust, and generalizable multimodal tactile representations to enhance task performance and physical property reasoning in robotic dexterous manipulation. To this end, we propose Sparsh-X—the first self-supervised representation model that jointly encodes image, audio, motion, and pressure tactile signals, trained on a million-scale real-world contact dataset to capture complementary spatiotemporal features. Key contributions include: (1) the first unified representation framework for multimodal tactile signals; (2) a novel tactile pretraining paradigm explicitly designed for physical property perception; and (3) a joint tactile-action representation learning scheme coupled with a real-to-simulation transfer adaptation framework. Evaluated on standard benchmarks, Sparsh-X achieves a 63% improvement in policy success rate, a 90% gain in object state recovery robustness, and a 48% increase in physical property classification accuracy.

Technology Category

Application Category

📝 Abstract

We present Sparsh-X, the first multisensory touch representations across four tactile modalities: image, audio, motion, and pressure. Trained on ~1M contact-rich interactions collected with the Digit 360 sensor, Sparsh-X captures complementary touch signals at diverse temporal and spatial scales. By leveraging self-supervised learning, Sparsh-X fuses these modalities into a unified representation that captures physical properties useful for robot manipulation tasks. We study how to effectively integrate real-world touch representations for both imitation learning and tactile adaptation of sim-trained policies, showing that Sparsh-X boosts policy success rates by 63% over an end-to-end model using tactile images and improves robustness by 90% in recovering object states from touch. Finally, we benchmark Sparsh-X ability to make inferences about physical properties, such as object-action identification, material-quantity estimation, and force estimation. Sparsh-X improves accuracy in characterizing physical properties by 48% compared to end-to-end approaches, demonstrating the advantages of multisensory pretraining for capturing features essential for dexterous manipulation.

Problem

Research questions and friction points this paper is trying to address.

Develops multisensory touch representations for robot manipulation

Improves policy success rates and robustness in tactile adaptation

Enhances accuracy in inferring physical properties for dexterous manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multisensory touch representations across four modalities

Self-supervised learning for unified tactile representation

Boosts policy success rates by 63%

🔎 Similar Papers

No similar papers found.