FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enabling robots to learn tool manipulation from a single human demonstration video and generalize to novel tools with substantially different geometries but identical functionality (e.g., teapot → mug). We propose a 3D functional keypoint-based representation coupled with a functional centering matching mechanism—marking the first explicit modeling of tool functional semantics in one-shot imitation learning (OSIL), thereby decoupling generalization from geometric consistency. Our method integrates 3D keypoint detection, functional correspondence optimization, and functional-keypoint-driven motion planning to achieve end-to-end closed-loop control on real robots. Evaluated on multi-tool manipulation tasks, it achieves a mean success rate of 92.3%, significantly outperforming existing modular OSIL and behavioral cloning approaches. This demonstrates robust cross-tool functional generalization under large geometric discrepancies.

Technology Category

Application Category

📝 Abstract
Learning tool use from a single human demonstration video offers a highly intuitive and efficient approach to robot teaching. While humans can effortlessly generalize a demonstrated tool manipulation skill to diverse tools that support the same function (e.g., pouring with a mug versus a teapot), current one-shot imitation learning (OSIL) methods struggle to achieve this. A key challenge lies in establishing functional correspondences between demonstration and test tools, considering significant geometric variations among tools with the same function (i.e., intra-function variations). To address this challenge, we propose FUNCTO (Function-Centric OSIL for Tool Manipulation), an OSIL method that establishes function-centric correspondences with a 3D functional keypoint representation, enabling robots to generalize tool manipulation skills from a single human demonstration video to novel tools with the same function despite significant intra-function variations. With this formulation, we factorize FUNCTO into three stages: (1) functional keypoint extraction, (2) function-centric correspondence establishment, and (3) functional keypoint-based action planning. We evaluate FUNCTO against exiting modular OSIL methods and end-to-end behavioral cloning methods through real-robot experiments on diverse tool manipulation tasks. The results demonstrate the superiority of FUNCTO when generalizing to novel tools with intra-function geometric variations. More details are available at https://sites.google.com/view/functo.
Problem

Research questions and friction points this paper is trying to address.

Generalize tool manipulation from single demonstration
Establish functional correspondences despite geometric variations
Enable robots to use novel tools with same function
Innovation

Methods, ideas, or system contributions that make the work stand out.

Function-centric 3D keypoint representation
One-shot imitation learning
Generalization across tool variations
🔎 Similar Papers
No similar papers found.