ShapeGaussian: High-Fidelity 4D Human Reconstruction in Monocular Videos via Vision Priors

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the distortions and artifacts commonly observed in existing 4D human reconstruction methods from monocular video, which often stem from reliance on parametric templates such as SMPL or high sensitivity to pose estimation errors. To overcome these limitations, we propose the first template-free, high-fidelity 4D reconstruction framework. Our approach leverages 2D visual priors together with a pretrained data-driven model to generate an initial deformable geometry, which is subsequently refined through a neural deformation field and a multi-reference-frame strategy to capture fine dynamic details. By eliminating template constraints entirely, our method effectively mitigates issues caused by occluded keypoints and inaccurate pose estimates. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art template-based methods in reconstruction accuracy, visual quality, and robustness across diverse everyday motions captured in monocular videos.

Technology Category

Application Category

📝 Abstract

We introduce ShapeGaussian, a high-fidelity, template-free method for 4D human reconstruction from casual monocular videos. Generic reconstruction methods lacking robust vision priors, such as 4DGS, struggle to capture high-deformation human motion without multi-view cues. While template-based approaches, primarily relying on SMPL, such as HUGS, can produce photorealistic results, they are highly susceptible to errors in human pose estimation, often leading to unrealistic artifacts. In contrast, ShapeGaussian effectively integrates template-free vision priors to achieve both high-fidelity and robust scene reconstructions. Our method follows a two-step pipeline: first, we learn a coarse, deformable geometry using pretrained models that estimate data-driven priors, providing a foundation for reconstruction. Then, we refine this geometry using a neural deformation model to capture fine-grained dynamic details. By leveraging 2D vision priors, we mitigate artifacts from erroneous pose estimation in template-based methods and employ multiple reference frames to resolve the invisibility issue of 2D keypoints in a template-free manner. Extensive experiments demonstrate that ShapeGaussian surpasses template-based methods in reconstruction accuracy, achieving superior visual quality and robustness across diverse human motions in casual monocular videos.

Problem

Research questions and friction points this paper is trying to address.

4D human reconstruction

monocular video

vision priors

template-free

high-fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

4D human reconstruction

template-free

vision priors