SAM 3D Body: Robust Full-Body Human Mesh Recovery

๐Ÿ“… 2026-02-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limited generalization and inconsistent accuracy of single-image full-body 3D human reconstruction in complex in-the-wild scenes. To this end, we propose an encoder-decoder framework that supports multimodal user-provided cuesโ€”such as 2D keypoints and segmentation masksโ€”to enable high-fidelity, user-guided reconstruction of full-body pose and shape, including hands and feet. Our key innovation lies in the introduction of the Momentum Human Representation (MHR), a novel parametric model that decouples skeletal structure from surface geometry. We further develop a multi-stage annotation pipeline integrating manual labeling, differentiable optimization, multi-view geometry, and dense keypoints, coupled with an efficient data curation strategy. Extensive experiments demonstrate that our method significantly outperforms existing approaches across diverse real-world scenarios, achieving state-of-the-art performance in both quantitative metrics and user preference studies. The model and MHR representation are publicly released.

Technology Category

Application Category

๐Ÿ“ Abstract
We introduce SAM 3D Body (3DB), a promptable model for single-image full-body 3D human mesh recovery (HMR) that demonstrates state-of-the-art performance, with strong generalization and consistent accuracy in diverse in-the-wild conditions. 3DB estimates the human pose of the body, feet, and hands. It is the first model to use a new parametric mesh representation, Momentum Human Rig (MHR), which decouples skeletal structure and surface shape. 3DB employs an encoder-decoder architecture and supports auxiliary prompts, including 2D keypoints and masks, enabling user-guided inference similar to the SAM family of models. We derive high-quality annotations from a multi-stage annotation pipeline that uses various combinations of manual keypoint annotation, differentiable optimization, multi-view geometry, and dense keypoint detection. Our data engine efficiently selects and processes data to ensure data diversity, collecting unusual poses and rare imaging conditions. We present a new evaluation dataset organized by pose and appearance categories, enabling nuanced analysis of model behavior. Our experiments demonstrate superior generalization and substantial improvements over prior methods in both qualitative user preference studies and traditional quantitative analysis. Both 3DB and MHR are open-source.
Problem

Research questions and friction points this paper is trying to address.

3D human mesh recovery
full-body reconstruction
in-the-wild conditions
pose estimation
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

promptable model
Momentum Human Rig
3D human mesh recovery
auxiliary prompts
multi-stage annotation
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xitong Yang
Meta Superintelligence Labs
D
Devansh Kukreja
Meta Superintelligence Labs
D
Don Pinkus
Meta Superintelligence Labs
A
Anushka Sagar
Meta Superintelligence Labs
Taosha Fan
Taosha Fan
Meta AI
Jinhyung Park
Jinhyung Park
PhD Student, Carnegie Mellon University
Computer Vision
S
Soyong Shin
Meta Superintelligence Labs
Jinkun Cao
Jinkun Cao
Ph.D. student in Robotics, Carnegie Mellon University
Computer VisionRoboticsAR/VR
J
Jiawei Liu
Meta Superintelligence Labs
N
Nicolas Ugrinovic
Meta Superintelligence Labs
Matt Feiszli
Matt Feiszli
Facebook AI Research
Machine LearningComputer VisionHarmonic AnalysisGeometry
J
Jitendra Malik
Meta Superintelligence Labs
P
Piotr Dollar
Meta Superintelligence Labs
Kris Kitani
Kris Kitani
Carnegie Mellon University, Meta FAIR
Computer VisionAIMachine Learning