🤖 AI Summary
Existing full-body Gaussian avatar methods struggle to preserve high-frequency facial expressions and geometric details due to limited facial representation capacity. This work proposes a dual-branch animatable digital human modeling approach that leverages a clothed Momentum Human Rig (MHR) template as a base, separately modeling non-rigid body deformations and fine facial geometry. Efficient rendering is achieved through linear blend skinning combined with differentiable Gaussian rasterization. A novel face-focused deformation branch is introduced and jointly optimized with adversarial and perceptual losses tailored to facial regions, significantly enhancing close-up facial realism. Evaluated on the AvatarReX dataset, the method achieves state-of-the-art facial rendering performance with PSNR of 26.243, SSIM of 0.964, and LPIPS of 0.084, substantially outperforming existing approaches.
📝 Abstract
Existing full-body Gaussian avatar methods primarily optimize global reconstruction quality and often fail to preserve fine-grained facial geometry and expression details. This challenge arises from limited facial representational capacity that causes difficulties in modeling high-frequency pose-dependent deformations. To address this, we propose F3G-Avatar, a full-body, face-aware avatar synthesis method that reconstructs animatable human representations from multi-view RGB video and regressed pose/shape parameters. Starting from a clothed Momentum Human Rig (MHR) template, front/back positional maps are rendered and decoded into 3D Gaussians through a two-branch architecture: a body branch that captures pose-dependent non-rigid deformations and a face-focused deformation branch that refines head geometry and appearance. The predicted Gaussians are fused, posed with linear blend skinning (LBS), and rendered with differentiable Gaussian splatting. Training combines reconstruction and perceptual objectives with a face-specific adversarial loss to enhance realism in close-up views. Experiments demonstrate strong rendering quality, with face-view performance reaching PSNR/SSIM/LPIPS of 26.243/0.964/0.084 on the AvatarReX dataset. Ablations further highlight contributions of the MHR template and the face-focused deformation. F3G-Avatar provides a practical, high-quality pipeline for realistic, animatable full-body avatar synthesis.