🤖 AI Summary
This work addresses the challenges of bimanual dexterous grasping—limited generalization, single-strategy policies, and scarcity of high-quality data—which hinder robust manipulation of objects with diverse shapes, sizes, and weights. The authors propose UltraDexGrasp, the first general-purpose dexterous grasping framework for bimanual robots. It constructs a large-scale, multi-strategy synthetic dataset through optimization-driven grasp synthesis and planning-driven trajectory generation. Leveraging point cloud inputs, the system employs unidirectional attention-based feature aggregation and end-to-end policy learning to achieve zero-shot sim-to-real transfer without any real-world training data. Evaluated on previously unseen objects in real-world settings, the method attains an average success rate of 81.2% and open-sources its data generation pipeline to advance research in this domain.
📝 Abstract
Grasping is a fundamental capability for robots to interact with the physical world. Humans, equipped with two hands, autonomously select appropriate grasp strategies based on the shape, size, and weight of objects, enabling robust grasping and subsequent manipulation. In contrast, current robotic grasping remains limited, particularly in multi-strategy settings. Although substantial efforts have targeted parallel-gripper and single-hand grasping, dexterous grasping for bimanual robots remains underexplored, with data being a primary bottleneck. Achieving physically plausible and geometrically conforming grasps that can withstand external wrenches poses significant challenges. To address these issues, we introduce UltraDexGrasp, a framework for universal dexterous grasping with bimanual robots. The proposed data-generation pipeline integrates optimization-based grasp synthesis with planning-based demonstration generation, yielding high-quality and diverse trajectories across multiple grasp strategies. With this framework, we curate UltraDexGrasp-20M, a large-scale, multi-strategy grasp dataset comprising 20 million frames across 1,000 objects. Based on UltraDexGrasp-20M, we further develop a simple yet effective grasp policy that takes point clouds as input, aggregates scene features via unidirectional attention, and predicts control commands. Trained exclusively on synthetic data, the policy achieves robust zero-shot sim-to-real transfer and consistently succeeds on novel objects with varied shapes, sizes, and weights, attaining an average success rate of 81.2% in real-world universal dexterous grasping. To facilitate future research on grasping with bimanual robots, we open-source the data generation pipeline at https://github.com/InternRobotics/UltraDexGrasp.