🤖 AI Summary
Existing methods for joint 3D reconstruction of hands and unknown/transparent objects during handover tasks rely heavily on geometric priors or depth sensors, exhibiting poor generalization and failing to handle transparent objects. Method: We propose the first RGB binocular-based joint reconstruction framework featuring: (1) a probabilistic fusion mechanism that unifies hand–object geometric modeling; (2) learning of a generalizable 3D shape prior from large-scale synthetic hand–object data to improve robustness to unseen and transparent objects; and (3) a projection-based outlier rejection scheme to enhance reconstruction robustness. Contribution/Results: Experiments demonstrate state-of-the-art performance—achieving the lowest object Chamfer distance under both monocular and binocular settings. Furthermore, our method enables stable robotic reception of diverse household objects, significantly improving handover success rates in real-world human–robot handover tasks.
📝 Abstract
Jointly estimating hand and object shape ensures the success of the robot grasp in human-to-robot handovers. However, relying on hand-crafted prior knowledge about the geometric structure of the object fails when generalising to unseen objects, and depth sensors fail to detect transparent objects such as drinking glasses. In this work, we propose a stereo-based method for hand-object reconstruction that combines single-view reconstructions probabilistically to form a coherent stereo reconstruction. We learn 3D shape priors from a large synthetic hand-object dataset to ensure that our method is generalisable, and use RGB inputs instead of depth as RGB can better capture transparent objects. We show that our method achieves a lower object Chamfer distance compared to existing RGB based hand-object reconstruction methods on single view and stereo settings. We process the reconstructed hand-object shape with a projection-based outlier removal step and use the output to guide a human-to-robot handover pipeline with wide-baseline stereo RGB cameras. Our hand-object reconstruction enables a robot to successfully receive a diverse range of household objects from the human.