🤖 AI Summary
Existing monocular hand-object reconstruction methods typically rely on object templates or assume full object visibility, rendering them ineffective under severe occlusions caused by limited viewpoints in real-world scenarios. This paper introduces the first template-free reconstruction framework integrating a novel-view synthesis diffusion model (NVSDM) as a 3D prior: it implicitly regularizes the geometric plausibility of occluded regions via diffusion priors, while jointly optimizing hand and object geometry through physical contact constraints and visible-region geometric alignment. Evaluated on monocular short video inputs, our method significantly improves shape completeness under occlusion and interaction plausibility, outperforming state-of-the-art methods on multiple challenging hand-object interaction benchmarks. The core contribution is the first incorporation of large-scale generative diffusion models into hand-object reconstruction—demonstrating their efficacy and generalizability as strong implicit surface regularizers.
📝 Abstract
Most RGB-based hand-object reconstruction methods rely on object templates, while template-free methods typically assume full object visibility. This assumption often breaks in real-world settings, where fixed camera viewpoints and static grips leave parts of the object unobserved, resulting in implausible reconstructions. To overcome this, we present MagicHOI, a method for reconstructing hands and objects from short monocular interaction videos, even under limited viewpoint variation. Our key insight is that, despite the scarcity of paired 3D hand-object data, large-scale novel view synthesis diffusion models offer rich object supervision. This supervision serves as a prior to regularize unseen object regions during hand interactions. Leveraging this insight, we integrate a novel view synthesis model into our hand-object reconstruction framework. We further align hand to object by incorporating visible contact constraints. Our results demonstrate that MagicHOI significantly outperforms existing state-of-the-art hand-object reconstruction methods. We also show that novel view synthesis diffusion priors effectively regularize unseen object regions, enhancing 3D hand-object reconstruction.