๐ค AI Summary
This work proposes a real-time method for 3D human reconstruction and animation that operates without depth supervision, fixed viewpoints, or iterative refinement. Leveraging a single forward pass of a neural network, the approach maps multi-view RGB images and SMPL-X poses to 3D Gaussian primitives explicitly associated with human body vertices. It combines geometric constraints with unconstrained Gaussians to model non-rigid details such as clothing and hair, and employs linear blend skinning for efficient animation. By directly coupling 3D Gaussian splatting with a parametric human body modelโa first in the fieldโthe method achieves state-of-the-art reconstruction quality on THuman 2.1, AvatarReX, and THuman 4.0, while enabling real-time interaction and animation.
๐ Abstract
We present a generalizable feed-forward Gaussian splatting framework for human 3D reconstruction and real-time animation that operates directly on multi-view RGB images and their associated SMPL-X poses. Unlike prior methods that rely on depth supervision, fixed input views, UV map, or repeated feed-forward inference for each target view or pose, our approach predicts, in a canonical pose, a set of 3D Gaussian primitives associated with each SMPL-X vertex. One Gaussian is regularized to remain close to the SMPL-X surface, providing a strong geometric prior and stable correspondence to the parametric body model, while an additional small set of unconstrained Gaussians per vertex allows the representation to capture geometric structures that deviate from the parametric surface, such as clothing and hair. In contrast to recent approaches such as HumanRAM, which require repeated network inference to synthesize novel poses, our method produces an animatable human representation from a single forward pass; by explicitly associating Gaussian primitives with SMPL-X vertices, the reconstructed model can be efficiently animated via linear blend skinning without further network evaluation. We evaluate our method on the THuman 2.1, AvatarReX and THuman 4.0 datasets, where it achieves reconstruction quality comparable to state-of-the-art methods while uniquely supporting real-time animation and interactive applications. Code and pre-trained models are available at https://github.com/Devdoot57/HumanGS .