Canonical Representation and Force-Based Pretraining of 3D Tactile for Dexterous Visuo-Tactile Policy Learning

📅 2024-09-26
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Large-scale labeled data and effective pretraining paradigms are lacking for 3D tactile feature learning. Method: This paper proposes a force-driven self-supervised pretraining framework. Contribution/Results: (1) It introduces the first standardized 3D tactile representation, unifying spatial and force information via normalized coordinate mapping and force-field encoding; (2) it designs a local–global force-aware contrastive self-supervised task to jointly learn spatial–force dual-channel representations; (3) it supports vision–tactile multimodal policy fine-tuning. Evaluated on four real-world fine manipulation tasks, the framework achieves an average success rate of 78%, significantly outperforming prior methods. This work is the first to empirically validate that force-perception-guided tactile representation enhances both effectiveness and generalization in contact-intensive dexterous manipulation.

Technology Category

Application Category

📝 Abstract
Tactile sensing plays a vital role in enabling robots to perform fine-grained, contact-rich tasks. However, the high dimensionality of tactile data, due to the large coverage on dexterous hands, poses significant challenges for effective tactile feature learning, especially for 3D tactile data, as there are no large standardized datasets and no strong pretrained backbones. To address these challenges, we propose a novel canonical representation that reduces the difficulty of 3D tactile feature learning and further introduces a force-based self-supervised pretraining task to capture both local and net force features, which are crucial for dexterous manipulation. Our method achieves an average success rate of 78% across four fine-grained, contact-rich dexterous manipulation tasks in real-world experiments, demonstrating effectiveness and robustness compared to other methods. Further analysis shows that our method fully utilizes both spatial and force information from 3D tactile data to accomplish the tasks. The videos can be viewed at https://3dtacdex.github.io.
Problem

Research questions and friction points this paper is trying to address.

Reduces 3D tactile feature learning difficulty
Introduces force-based pretraining for manipulation
Improves success in dexterous manipulation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Canonical representation simplifies 3D tactile learning
Force-based pretraining captures essential force features
Method utilizes spatial and force information effectively
🔎 Similar Papers
No similar papers found.
T
Tianhao Wu
Center on Frontiers of Computing Studies, School of Computer Science, Peking University, Beijing 100871, China, also with PKU-Agibot Lab, School of Computer Science, Peking University, Beijing 100871, China, and also with National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University
Jinzhou Li
Jinzhou Li
Duke University
RoboticsDeep Reinforcement LearningManipulation
Jiyao Zhang
Jiyao Zhang
Peking University
Embodied AIRobotics3D Vision
Mingdong Wu
Mingdong Wu
Peking University
Embodied AIReinforcement LearningGenerative Model
H
Hao Dong
Center on Frontiers of Computing Studies, School of Computer Science, Peking University, Beijing 100871, China, also with PKU-Agibot Lab, School of Computer Science, Peking University, Beijing 100871, China, and also with National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University