Universal Dexterous Functional Grasping via Demonstration-Editing Reinforcement Learning

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of achieving generalizable functional grasping with dexterous hands on unknown objects—complicated by intricate reward design, inefficient multi-task exploration, and sim-to-real transfer bottlenecks—this paper introduces “Demonstration-Editing Reinforcement Learning,” a novel paradigm. Methodologically, it is the first to decouple grasp style from object functional affordances; leveraging a single human demonstration, it enables one-shot policy editing for zero-shot generalization across unseen objects, functions, and styles. Furthermore, it integrates vision-language models (VLMs) to enable end-to-end planning from natural language instructions to executable grasp actions. Evaluated in both simulation and on real dexterous hardware, the approach improves functional grasp accuracy by 32%, successfully generalizes to 128 previously unseen object–function–style combinations, and supports closed-loop, instruction-driven execution.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) has achieved great success in dexterous grasping, significantly improving grasp performance and generalization from simulation to the real world. However, fine-grained functional grasping, which is essential for downstream manipulation tasks, remains underexplored and faces several challenges: the complexity of specifying goals and reward functions for functional grasps across diverse objects, the difficulty of multi-task RL exploration, and the challenge of sim-to-real transfer. In this work, we propose DemoFunGrasp for universal dexterous functional grasping. We factorize functional grasping conditions into two complementary components - grasping style and affordance - and integrate them into an RL framework that can learn to grasp any object with any functional grasping condition. To address the multi-task optimization challenge, we leverage a single grasping demonstration and reformulate the RL problem as one-step demonstration editing, substantially enhancing sample efficiency and performance. Experimental results in both simulation and the real world show that DemoFunGrasp generalizes to unseen combinations of objects, affordances, and grasping styles, outperforming baselines in both success rate and functional grasping accuracy. In addition to strong sim-to-real capability, by incorporating a vision-language model (VLM) for planning, our system achieves autonomous instruction-following grasp execution.
Problem

Research questions and friction points this paper is trying to address.

Addresses fine-grained functional grasping for downstream manipulation tasks
Solves multi-task RL exploration and sim-to-real transfer challenges
Enables universal grasping across diverse objects and functional conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Demonstration-editing RL for functional grasping
Factorizing conditions into style and affordance
Vision-language model for autonomous instruction following
🔎 Similar Papers
No similar papers found.