🤖 AI Summary
Real-world robotic skill acquisition faces challenges including scarcity of human demonstration data, significant simulation-to-reality (sim-to-real) discrepancy, and complex dual-contact interactions inherent in tool manipulation.
Method: We propose a few-shot skill transfer framework leveraging multimodal proximity and tactile sensing. It adopts a pretraining-finetuning paradigm: a contact-state recognition model is first pretrained in simulation, then fine-tuned across domains using only a few real-world human demonstrations (3–5 trials). Proximity and tactile signals are fused to jointly model tool-environment contact dynamics and local geometric structure.
Contribution/Results: Evaluated on the Franka Emika platform, our method enables robots to generalize surface-following skills across diverse tools with markedly different physical and geometric properties—requiring only 3–5 demonstrations. It significantly improves cross-tool and cross-environment skill transfer efficiency and robustness, effectively bridging the sim-to-real gap while mitigating reliance on extensive real-world supervision.
📝 Abstract
Tools extend the manipulation abilities of robots, much like they do for humans. Despite human expertise in tool manipulation, teaching robots these skills faces challenges. The complexity arises from the interplay of two simultaneous points of contact: one between the robot and the tool, and another between the tool and the environment. Tactile and proximity sensors play a crucial role in identifying these complex contacts. However, learning tool manipulation using these sensors remains challenging due to limited real-world data and the large sim-to-real gap. To address this, we propose a few-shot tool-use skill transfer framework using multimodal sensing. The framework involves pre-training the base policy to capture contact states common in tool-use skills in simulation and fine-tuning it with human demonstrations collected in the real-world target domain to bridge the domain gap. We validate that this framework enables teaching surface-following tasks using tools with diverse physical and geometric properties with a small number of demonstrations on the Franka Emika robot arm. Our analysis suggests that the robot acquires new tool-use skills by transferring the ability to recognise tool-environment contact relationships from pre-trained to fine-tuned policies. Additionally, combining proximity and tactile sensors enhances the identification of contact states and environmental geometry.