MetricHMR: Metric Human Mesh Recovery from Monocular Images

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Monocular human mesh recovery (HMR) suffers from scale and depth ambiguities, hindering reconstruction of geometrically plausible 3D meshes with metric-scale (meter-level) dimensions and accurate global translation. This work formally establishes, for the first time, the necessity of the standard perspective projection model for metric-accurate HMR. We propose a novel end-to-end differentiable framework based on ray maps—geometric representations encoding 2D bounding boxes, camera parameters, and geometric constraints in a unified manner—eliminating the need for auxiliary regularization modules. Our method directly regresses human meshes with true metric scale and globally consistent 6DoF pose. It achieves metric-scale estimation of pose, shape, and global translation across diverse indoor and outdoor scenes. Extensive experiments demonstrate substantial improvements over prevailing sequential HMR approaches, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
We introduce MetricHMR (Metric Human Mesh Recovery), an approach for metric human mesh recovery with accurate global translation from monocular images. In contrast to existing HMR methods that suffer from severe scale and depth ambiguity, MetricHMR is able to produce geometrically reasonable body shape and global translation in the reconstruction results. To this end, we first systematically analyze previous HMR methods on camera models to emphasize the critical role of the standard perspective projection model in enabling metric-scale HMR. We then validate the acceptable ambiguity range of metric HMR under the standard perspective projection model. Finally, we contribute a novel approach that introduces a ray map based on the standard perspective projection to jointly encode bounding-box information, camera parameters, and geometric cues for End2End metric HMR without any additional metric-regularization modules. Extensive experiments demonstrate that our method achieves state-of-the-art performance, even compared with sequential HMR methods, in metric pose, shape, and global translation estimation across both indoor and in-the-wild scenarios.
Problem

Research questions and friction points this paper is trying to address.

Recovering metric human mesh from monocular images
Resolving scale and depth ambiguity in HMR methods
Estimating accurate global translation and body shape
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses standard perspective projection model
Introduces ray map for geometric encoding
Achieves metric-scale HMR without regularization
🔎 Similar Papers
No similar papers found.
H
He Zhang
Tsinghua University
C
Chentao Song
Tsinghua University
Hongwen Zhang
Hongwen Zhang
Beijing Normal University
Computer VisionComputer Graphics3D VisionVirtual HumansDigital Humans
T
Tao Yu
Tsinghua University