SkinningGS: Editable Dynamic Human Scene Reconstruction Using Gaussian Splatting Based on a Skinning Model

๐Ÿ“… 2025-06-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenges of joint human-background reconstruction and real-time interactive rendering from monocular video. We propose a decoupled dynamic scene modeling framework: (1) A texture-guided SMPL surface point cloud growth mechanism generates high-fidelity, position- and texture-driven human point clouds; (2) LBS weights enable hyperparameter-free densification and real-time deformation, supporting pose and viewpoint generalization as well as cross-species transfer; (3) Human and background are jointly optimized via Gaussian splatting, with geometry and appearance features predicted by a CNN. Experiments demonstrate superior reconstruction quality over HUGS, 50% reduction in training GPU memory consumption, and real-time rendering at over 100 FPSโ€”approximately six times faster than HUGS. To our knowledge, this is the first method enabling high-quality, real-time interactive editing and novel-view synthesis directly from monocular video input.

Technology Category

Application Category

๐Ÿ“ Abstract
Reconstructing an interactive human avatar and the background from a monocular video of a dynamic human scene is highly challenging. In this work we adopt a strategy of point cloud decoupling and joint optimization to achieve the decoupled reconstruction of backgrounds and human bodies while preserving the interactivity of human motion. We introduce a position texture to subdivide the Skinned Multi-Person Linear (SMPL) body model's surface and grow the human point cloud. To capture fine details of human dynamics and deformations, we incorporate a convolutional neural network structure to predict human body point cloud features based on texture. This strategy makes our approach free of hyperparameter tuning for densification and efficiently represents human points with half the point cloud of HUGS. This approach ensures high-quality human reconstruction and reduces GPU resource consumption during training. As a result, our method surpasses the previous state-of-the-art HUGS in reconstruction metrics while maintaining the ability to generalize to novel poses and views. Furthermore, our technique achieves real-time rendering at over 100 FPS, $sim$6$ imes$ the HUGS speed using only Linear Blend Skinning (LBS) weights for human transformation. Additionally, this work demonstrates that this framework can be extended to animal scene reconstruction when an accurately-posed model of an animal is available.
Problem

Research questions and friction points this paper is trying to address.

Reconstruct interactive human avatar from monocular video
Decouple and optimize human and background point clouds
Achieve real-time rendering with high FPS performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Point cloud decoupling and joint optimization
Position texture for SMPL model subdivision
CNN for predicting body point cloud features
๐Ÿ”Ž Similar Papers
No similar papers found.