SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

Existing image-based 4D human mesh reconstruction (HMR) methods—e.g., SAM 3D Body—perform frame-wise inference on videos, suffering from temporal inconsistency and degraded performance under occlusion. To address this, we propose the first training-free and fine-tuning-free 4D human mesh reconstruction framework. Our method leverages promptable video segmentation to generate identity-consistent mask snippets, which guide SAM 3D Body in propagating pose and shape across frames. We further introduce an occlusion-aware optimization module and a parallel mask inpainting mechanism to enhance temporal stability and robustness to occlusions—especially in multi-person interaction scenarios. Crucially, the entire pipeline requires no additional training or adaptation. Experimental results demonstrate significant improvements in temporal coherence and occlusion recovery, enabling efficient and stable multi-person 4D reconstruction on real-world videos.

Technology Category

Application Category

📝 Abstract

Human Mesh Recovery (HMR) aims to reconstruct 3D human pose and shape from 2D observations and is fundamental to human-centric understanding in real-world scenarios. While recent image-based HMR methods such as SAM 3D Body achieve strong robustness on in-the-wild images, they rely on per-frame inference when applied to videos, leading to temporal inconsistency and degraded performance under occlusions. We address these issues without extra training by leveraging the inherent human continuity in videos. We propose SAM-Body4D, a training-free framework for temporally consistent and occlusion-robust HMR from videos. We first generate identity-consistent masklets using a promptable video segmentation model, then refine them with an Occlusion-Aware module to recover missing regions. The refined masklets guide SAM 3D Body to produce consistent full-body mesh trajectories, while a padding-based parallel strategy enables efficient multi-human inference. Experimental results demonstrate that SAM-Body4D achieves improved temporal stability and robustness in challenging in-the-wild videos, without any retraining. Our code and demo are available at: https://github.com/gaomingqi/sam-body4d.

Problem

Research questions and friction points this paper is trying to address.

Achieves temporally consistent 4D human mesh recovery from videos

Enhances occlusion robustness in human mesh reconstruction without retraining

Enables efficient multi-human inference in challenging real-world videos

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework for 4D human mesh recovery

Occlusion-aware masklet refinement for missing region recovery

Padding-based parallel strategy for efficient multi-human inference

🔎 Similar Papers

DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos