🤖 AI Summary
This work addresses the ambiguity in distal joint poses during human mesh recovery caused by occlusion or insufficient depth cues. The authors propose a two-stage hybrid framework that first employs a deterministic regression module to stably recover the torso and root anchor points, followed by probabilistic flow matching to generate plausible poses for non-torso body parts. The method innovatively differentiates the treatment of torso and distal joints by integrating composite target representations, geometry-aware supervision, and feature-aware classifier-free guidance. Additionally, a multi-view synthetic data pipeline is introduced to provide image–camera–motion paired supervision. The approach achieves state-of-the-art performance on both camera-space and world-space benchmarks, demonstrating significant improvements over strong baselines—particularly under heavy occlusion and on world-space metrics sensitive to drift.
📝 Abstract
Human Mesh Recovery (HMR) is fundamentally ambiguous: under occlusion or weak depth cues, multiple 3D bodies can explain the same image evidence. This ambiguity is not uniform across the body, as torso pose and root structure are often relatively well constrained, whereas distal articulations such as the arms and legs are more uncertain. Building on this observation, we propose FactorizedHMR, a two-stage framework that treats these two regimes differently. A deterministic regression module first recovers a stable torso-root anchor, and a probabilistic flow-matching module then completes the remaining non-torso articulation. To make this completion reliable, we combine a composite target representation with geometry-aware supervision and feature-aware classifier-free guidance, preserving the torso-root anchor while improving single-reference recovery of ambiguity-prone articulation. We also introduce a synthetic data pipeline that provides the paired image-camera-motion supervision under diverse viewpoints. Across camera-space and world-space benchmarks, FactorizedHMR remains competitive with strong baselines, with the clearest gains in occlusion-heavy recovery and drift-sensitive world-space metrics.