🤖 AI Summary
This work addresses the challenge of reconstructing high-fidelity, topologically accurate full-body dynamic digital humans from monocular video, particularly in modeling intricate hand and facial details. The authors propose a structure-aware, fine-grained Gaussian splatting approach that integrates spatial tri-plane and temporal hexa-plane feature representations. A structure-aware Gaussian module is introduced to enhance pose consistency and texture fidelity, while a hand residual refinement module specifically improves local geometric detail. Requiring only a single-stage training pipeline, the method achieves high-quality reconstructions with natural motion dynamics and fine-scale geometry, outperforming state-of-the-art techniques in both quantitative metrics and qualitative evaluations.
📝 Abstract
Reconstructing photorealistic and topology-aware human avatars from monocular videos remains a significant challenge in the fields of computer vision and graphics. While existing 3D human avatar modeling approaches can effectively capture body motion, they often fail to accurately model fine details such as hand movements and facial expressions. To address this, we propose Structure-aware Fine-grained Gaussian Splatting (SFGS), a novel method for reconstructing expressive and coherent full-body 3D human avatars from a monocular video sequence. The SFGS use both spatial-only triplane and time-aware hexplane to capture dynamic features across consecutive frames. A structure-aware gaussian module is designed to capture pose-dependent details in a spatially coherent manner and improve pose and texture expression. To better model hand deformations, we also propose a residual refinement module based on fine-grained hand reconstruction. Our method requires only a single-stage training and outperforms state-of-the-art baselines in both quantitative and qualitative evaluations, generating high-fidelity avatars with natural motion and fine details. The code is on Github: https://github.com/Su245811YZ/SFGS