🤖 AI Summary
Addressing the challenge of achieving efficient dexterous grasping with multi-configuration robotic hands under extreme data scarcity—only 40 objects and数百 real-world grasps—this paper proposes a two-stage decoupled framework: “geometry-to-contact representation learning + hand-specific decision-making.” First, a universal contact representation is learned from scene geometry; second, lightweight, hand-tailored reinforcement learning policies are trained for decision-making. This design drastically reduces data dependency and enables zero-shot generalization to over 150 unseen objects. Evaluated on three heterogeneous robotic hands in cluttered scenes, the method achieves grasp success rates of 75–95%, improving to 80–98% with extended training. It matches human-level sample efficiency and cross-hand generalization capability, establishing a scalable paradigm for data-efficient dexterous manipulation.
📝 Abstract
We introduce an efficient approach for learning dexterous grasping with minimal data, advancing robotic manipulation capabilities across different robotic hands. Unlike traditional methods that require millions of grasp labels for each robotic hand, our method achieves high performance with human-level learning efficiency: only hundreds of grasp attempts on 40 training objects. The approach separates the grasping process into two stages: first, a universal model maps scene geometry to intermediate contact-centric grasp representations, independent of specific robotic hands. Next, a unique grasp decision model is trained for each robotic hand through real-world trial and error, translating these representations into final grasp poses. Our results show a grasp success rate of 75-95% across three different robotic hands in real-world cluttered environments with over 150 novel objects, improving to 80-98% with increased training objects. This adaptable method demonstrates promising applications for humanoid robots, prosthetics, and other domains requiring robust, versatile robotic manipulation.