🤖 AI Summary
Large-scale labeled data and effective pretraining paradigms are lacking for 3D tactile feature learning. Method: This paper proposes a force-driven self-supervised pretraining framework. Contribution/Results: (1) It introduces the first standardized 3D tactile representation, unifying spatial and force information via normalized coordinate mapping and force-field encoding; (2) it designs a local–global force-aware contrastive self-supervised task to jointly learn spatial–force dual-channel representations; (3) it supports vision–tactile multimodal policy fine-tuning. Evaluated on four real-world fine manipulation tasks, the framework achieves an average success rate of 78%, significantly outperforming prior methods. This work is the first to empirically validate that force-perception-guided tactile representation enhances both effectiveness and generalization in contact-intensive dexterous manipulation.
📝 Abstract
Tactile sensing plays a vital role in enabling robots to perform fine-grained, contact-rich tasks. However, the high dimensionality of tactile data, due to the large coverage on dexterous hands, poses significant challenges for effective tactile feature learning, especially for 3D tactile data, as there are no large standardized datasets and no strong pretrained backbones. To address these challenges, we propose a novel canonical representation that reduces the difficulty of 3D tactile feature learning and further introduces a force-based self-supervised pretraining task to capture both local and net force features, which are crucial for dexterous manipulation. Our method achieves an average success rate of 78% across four fine-grained, contact-rich dexterous manipulation tasks in real-world experiments, demonstrating effectiveness and robustness compared to other methods. Further analysis shows that our method fully utilizes both spatial and force information from 3D tactile data to accomplish the tasks. The videos can be viewed at https://3dtacdex.github.io.