Visuotactile-Based Learning for Insertion with Compliant Hands

📅 2024-11-10

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Flexible underactuated hands face significant challenges in contact-intensive insertion tasks due to inherent compliance and lack of proprioception, leading to inaccurate pose estimation and high interaction uncertainty. To address this, we propose a vision–tactile multimodal policy learning framework. Our method introduces the first joint tactile–visual pose estimation approach for both the target object and insertion receptacle; designs a teacher–student distillation-based Transformer policy network enabling sim-to-real transfer without real-world fine-tuning; and employs omnidirectional tactile sensing combined with an external depth camera to construct high-fidelity, multi-modal perceptual inputs. Experiments on a physical robot platform demonstrate substantial improvements in insertion success rate and pose localization accuracy. These results empirically validate the critical role of tactile feedback in enhancing pose estimation robustness and reliable manipulation performance.

Technology Category

Application Category

📝 Abstract

Compared to rigid hands, underactuated compliant hands offer greater adaptability to object shapes, provide stable grasps, and are often more cost-effective. However, they introduce uncertainties in hand-object interactions due to their inherent compliance and lack of precise finger proprioception as in rigid hands. These limitations become particularly significant when performing contact-rich tasks like insertion. To address these challenges, additional sensing modalities are required to enable robust insertion capabilities. This letter explores the essential sensing requirements for successful insertion tasks with compliant hands, focusing on the role of visuotactile perception. We propose a simulation-based multimodal policy learning framework that leverages all-around tactile sensing and an extrinsic depth camera. A transformer-based policy, trained through a teacher-student distillation process, is successfully transferred to a real-world robotic system without further training. Our results emphasize the crucial role of tactile sensing in conjunction with visual perception for accurate object-socket pose estimation, successful sim-to-real transfer and robust task execution.

Problem

Research questions and friction points this paper is trying to address.

Address uncertainties in compliant hand-object interactions.

Enable robust insertion tasks using visuotactile perception.

Achieve sim-to-real transfer without additional training.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulation-based multimodal policy learning framework

Transformer-based policy with teacher-student distillation

All-around tactile sensing and extrinsic depth camera

🔎 Similar Papers

MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation