🤖 AI Summary
To address the extreme scarcity of 3D bimanual hand–object interaction data, this paper introduces BG-HOP—the first generative prior model specifically designed for this task. Methodologically, we extend unimanual generation priors to bimanual coordination for the first time, explicitly modeling the joint distribution of left hand, right hand, and object. Leveraging a diffusion-based framework, we integrate geometrically aware representations—including hand skeletal poses, mesh reconstructions, and object signed distance functions (SDFs)—and introduce cross-hand attention to explicitly capture inter-hand dependencies. The model supports conditional synthesis of bimanual poses and grasps driven by arbitrary object geometries. Remarkably, trained without any real bimanual interaction data, BG-HOP generates high-fidelity, physically plausible hand–object configurations. It significantly improves both the plausibility and diversity of object-adaptive grasping. Code and pretrained models are publicly released.
📝 Abstract
In this work, we present BG-HOP, a generative prior that seeks to model bimanual hand-object interactions in 3D. We address the challenge of limited bimanual interaction data by extending existing single-hand generative priors, demonstrating preliminary results in capturing the joint distribution of hands and objects. Our experiments showcase the model's capability to generate bimanual interactions and synthesize grasps for given objects. We make code and models publicly available.