GeoHand: Unlocking Prior Geometry Knowledge for Monocular 3D Hand Reconstruction

πŸ“… 2026-05-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

201K/year
πŸ€– AI Summary
Monocular 3D hand reconstruction suffers from severe geometric ambiguity under self-occlusion and hand-object interaction scenarios, where RGB appearance alone is insufficient to recover fine structural details. This work proposes GeoHand, a novel framework that, for the first time, integrates the frozen general-purpose monocular geometry estimator MoGe2 into hand reconstruction. GeoHand employs a GeoAdapter module for spatial feature recalibration and combines gated cross-modal fusion with a Keypoint Query Iterative Refinement (KQIR) mechanism to jointly optimize global geometric disambiguation and local joint constraints while preserving RGB-based appearance details. The method achieves state-of-the-art performance on FreiHAND, DexYCB, and HO3Dv3 benchmarks, demonstrating particularly significant improvements over existing approaches in complex occlusion and interaction settings.
πŸ“ Abstract
Monocular 3D hand reconstruction is intrinsically a geometric problem, yet RGB appearance features alone often struggle to resolve severe ambiguities caused by self-occlusions and hand-object interactions. While introducing depth can explicitly provide spatial cues, raw sensor-captured depth maps are extensively noisy and incomplete, limiting their usefulness for fine-grained hand reconstruction. To bridge this gap, we propose GeoHand, a novel framework that unlocks high-quality geometric priors from a frozen foundational monocular geometry estimator (MoGe2). Recognizing that these priors are oriented toward general scenes, we introduce a map-level GeoAdapter to recalibrate the spatial features, specifically adapting them for detailed hand reconstruction. Furthermore, to systematically integrate these adapted priors without overwhelming intrinsic RGB appearance cues, we employ a gated cross-modal token fusion strategy. Finally, to secure precise local articulation, we design a Keypoint-Queried Iterative Refiner (KQIR) that uses projected joint locations to query geometry-aware image features for spatial correction. By combining global geometric disambiguation with local refinement in a unified pipeline, GeoHand achieves state-of-the-art performance on FreiHAND, DexYCB, and HO3Dv3, especially under severe occlusions and hand-object interactions.
Problem

Research questions and friction points this paper is trying to address.

monocular 3D hand reconstruction
self-occlusions
hand-object interactions
geometric ambiguity
depth noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

geometric prior
GeoAdapter
gated cross-modal fusion
Keypoint-Queried Iterative Refiner
monocular 3D hand reconstruction
πŸ”Ž Similar Papers