🤖 AI Summary
This work addresses the challenge of real-time, high-fidelity head avatar rendering on mobile devices. We propose a hybrid translucent representation combining triangle meshes and anisotropic 3D Gaussians: meshes model surface structures (e.g., skin), while 3D Gaussian voxels capture complex non-surface details (e.g., hair). We pioneer the integration of meshes into a differentiable Gaussian splatting rendering framework, establishing a unified differentiable rendering pipeline. Coupled with a neural decoding network, multi-view image supervision, and RGBA texture synthesis, our method achieves high-quality translucent rendering. Quantitatively and qualitatively, visual fidelity matches that of pure 3D Gaussian approaches, while rendering speed reaches conventional mesh-level performance and GPU memory consumption is significantly reduced. To our knowledge, this is the first method enabling real-time, high-fidelity head rendering on mobile platforms.
📝 Abstract
We present Gaussian Pixel Codec Avatars (GPiCA), photorealistic head avatars that can be generated from multi-view images and efficiently rendered on mobile devices. GPiCA utilizes a unique hybrid representation that combines a triangle mesh and anisotropic 3D Gaussians. This combination maximizes memory and rendering efficiency while maintaining a photorealistic appearance. The triangle mesh is highly efficient in representing surface areas like facial skin, while the 3D Gaussians effectively handle non-surface areas such as hair and beard. To this end, we develop a unified differentiable rendering pipeline that treats the mesh as a semi-transparent layer within the volumetric rendering paradigm of 3D Gaussian Splatting. We train neural networks to decode a facial expression code into three components: a 3D face mesh, an RGBA texture, and a set of 3D Gaussians. These components are rendered simultaneously in a unified rendering engine. The networks are trained using multi-view image supervision. Our results demonstrate that GPiCA achieves the realism of purely Gaussian-based avatars while matching the rendering performance of mesh-based avatars.