Fast SAM 3D Body: Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Although SAM 3D Body achieves high reconstruction accuracy, its inference latency of several seconds hinders real-time applications. This work proposes the first training-free, real-time monocular 3D human mesh reconstruction framework that replaces conventional iterative fitting with a purely feed-forward pipeline. By decoupling spatial dependencies, employing parallel multi-crop feature extraction, applying architecture-aware pruning, and simplifying the Transformer decoder, the method efficiently generates SMPL parameters. It matches or even surpasses the original model’s accuracy—demonstrating superior performance on LSPET—while accelerating end-to-end inference by up to 10.9× and joint kinematic parameter generation by over 10,000×. The framework has been successfully deployed in a vision-only teleoperation system for humanoid robots.

Technology Category

Application Category

📝 Abstract
SAM 3D Body (3DB) achieves state-of-the-art accuracy in monocular 3D human mesh recovery, yet its inference latency of several seconds per image precludes real-time application. We present Fast SAM 3D Body, a training-free acceleration framework that reformulates the 3DB inference pathway to achieve interactive rates. By decoupling serial spatial dependencies and applying architecture-aware pruning, we enable parallelized multi-crop feature extraction and streamlined transformer decoding. Moreover, to extract the joint-level kinematics (SMPL) compatible with existing humanoid control and policy learning frameworks, we replace the iterative mesh fitting with a direct feedforward mapping, accelerating this specific conversion by over 10,000x. Overall, our framework delivers up to a 10.9x end-to-end speedup while maintaining on-par reconstruction fidelity, even surpassing 3DB on benchmarks such as LSPET. We demonstrate its utility by deploying Fast SAM 3D Body in a vision-only teleoperation system that-unlike methods reliant on wearable IMUs-enables real-time humanoid control and the direct collection of manipulation policies from a single RGB stream.
Problem

Research questions and friction points this paper is trying to address.

3D human mesh recovery
real-time inference
monocular vision
inference latency
humanoid control
Innovation

Methods, ideas, or system contributions that make the work stand out.

real-time 3D human mesh recovery
training-free acceleration
parallelized feature extraction
feedforward SMPL regression
vision-based teleoperation
🔎 Similar Papers
No similar papers found.
T
Timing Yang
USC Physical Superintelligence (PSI) Lab
S
Sicheng He
USC Physical Superintelligence (PSI) Lab
H
Hongyi Jing
USC Physical Superintelligence (PSI) Lab
J
Jiawei Yang
USC Physical Superintelligence (PSI) Lab
Z
Zhijian Liu
University of California, San Diego; NVIDIA
Chuhang Zou
Chuhang Zou
Research Scientist, Meta Reality Labs
Computer Vision3D Deep Learning
Yue Wang
Yue Wang
USC
Computer VisionRobotics