๐ค AI Summary
Current controllable photorealistic portrait generation methods suffer from pervasive rendering blur, limiting visual fidelity and practical applicability. To address this, we propose a neural-rendering-based framework for real-time, high-fidelity digital human avatar synthesis. Our approach introduces a novel cylindrical surface Gaussian point cloud representation, enhanced with skeletal priors to achieve geometric densification. We further design a novel view-aware segmentation module to improve cross-view semantic consistency. The end-to-end architecture integrates Gaussian splatting rendering, a UNet-based generator, camera calibration, and multimodal feature fusion. Evaluated on ZJU Mocap and THuman4 benchmarks, our method achieves PSNR scores of 32.94 dB and 33.39 dB, respectively, while sustaining a rendering speed of 79 FPSโsignificantly outperforming state-of-the-art approaches in both quality and efficiency.
๐ Abstract
Photorealistic and controllable human avatars have gained popularity in the research community thanks to rapid advances in neural rendering, providing fast and realistic synthesis tools. However, a limitation of current solutions is the presence of noticeable blurring. To solve this problem, we propose GaussianGAN, an animatable avatar approach developed for photorealistic rendering of people in real-time. We introduce a novel Gaussian splatting densification strategy to build Gaussian points from the surface of cylindrical structures around estimated skeletal limbs. Given the camera calibration, we render an accurate semantic segmentation with our novel view segmentation module. Finally, a UNet generator uses the rendered Gaussian splatting features and the segmentation maps to create photorealistic digital avatars. Our method runs in real-time with a rendering speed of 79 FPS. It outperforms previous methods regarding visual perception and quality, achieving a state-of-the-art results in terms of a pixel fidelity of 32.94db on the ZJU Mocap dataset and 33.39db on the Thuman4 dataset.