MetricHMR: Metric Human Mesh Recovery from Monocular Images

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

Monocular human mesh recovery (HMR) suffers from scale and depth ambiguities, hindering reconstruction of geometrically plausible 3D meshes with metric-scale (meter-level) dimensions and accurate global translation. This work formally establishes, for the first time, the necessity of the standard perspective projection model for metric-accurate HMR. We propose a novel end-to-end differentiable framework based on ray maps—geometric representations encoding 2D bounding boxes, camera parameters, and geometric constraints in a unified manner—eliminating the need for auxiliary regularization modules. Our method directly regresses human meshes with true metric scale and globally consistent 6DoF pose. It achieves metric-scale estimation of pose, shape, and global translation across diverse indoor and outdoor scenes. Extensive experiments demonstrate substantial improvements over prevailing sequential HMR approaches, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

We introduce MetricHMR (Metric Human Mesh Recovery), an approach for metric human mesh recovery with accurate global translation from monocular images. In contrast to existing HMR methods that suffer from severe scale and depth ambiguity, MetricHMR is able to produce geometrically reasonable body shape and global translation in the reconstruction results. To this end, we first systematically analyze previous HMR methods on camera models to emphasize the critical role of the standard perspective projection model in enabling metric-scale HMR. We then validate the acceptable ambiguity range of metric HMR under the standard perspective projection model. Finally, we contribute a novel approach that introduces a ray map based on the standard perspective projection to jointly encode bounding-box information, camera parameters, and geometric cues for End2End metric HMR without any additional metric-regularization modules. Extensive experiments demonstrate that our method achieves state-of-the-art performance, even compared with sequential HMR methods, in metric pose, shape, and global translation estimation across both indoor and in-the-wild scenarios.

Problem

Research questions and friction points this paper is trying to address.

Recovering metric human mesh from monocular images

Resolving scale and depth ambiguity in HMR methods

Estimating accurate global translation and body shape

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses standard perspective projection model

Introduces ray map for geometric encoding

Achieves metric-scale HMR without regularization

🔎 Similar Papers

MEGA: Masked Generative Autoencoder for Human Mesh Recovery

2024-05-29arXiv.orgCitations: 0

World Labs

$250,000-$350,000 base salary (good-faith estimate for San Francisco Bay Area upon hire; actual offer based on experience, skills, and qualifications)

San Francisco / San Francisco Office, San Francisco, California, United States

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)