🤖 AI Summary
Existing methods struggle to achieve high-fidelity, real-time 3D full-body talking avatars on mobile/AR devices, particularly due to bottlenecks in jointly modeling facial micro-expressions and body motion, as well as lightweight rendering. This paper introduces the first mobile-oriented full-body talking avatar framework based on 3D Gaussian Splatting. It leverages a personalized parametric human template and pioneers knowledge distillation of non-rigid deformation into a lightweight MLP, augmented with blend shapes for geometric detail compensation. A StyleUNet-pretrained feature guidance module enables multi-signal co-driving (speech, text, and pose). Evaluated on binocular AR devices such as Apple Vision Pro, the system achieves real-time rendering at 90 FPS. It significantly outperforms state-of-the-art methods in facial expression naturalness and bodily motion coherence, marking the first end-to-end real-time synthesis of high-accuracy full-body avatars on resource-constrained platforms.
📝 Abstract
Realistic 3D full-body talking avatars hold great potential in AR, with applications ranging from e-commerce live streaming to holographic communication. Despite advances in 3D Gaussian Splatting (3DGS) for lifelike avatar creation, existing methods struggle with fine-grained control of facial expressions and body movements in full-body talking tasks. Additionally, they often lack sufficient details and cannot run in real-time on mobile devices. We present TaoAvatar, a high-fidelity, lightweight, 3DGS-based full-body talking avatar driven by various signals. Our approach starts by creating a personalized clothed human parametric template that binds Gaussians to represent appearances. We then pre-train a StyleUnet-based network to handle complex pose-dependent non-rigid deformation, which can capture high-frequency appearance details but is too resource-intensive for mobile devices. To overcome this, we"bake"the non-rigid deformations into a lightweight MLP-based network using a distillation technique and develop blend shapes to compensate for details. Extensive experiments show that TaoAvatar achieves state-of-the-art rendering quality while running in real-time across various devices, maintaining 90 FPS on high-definition stereo devices such as the Apple Vision Pro.