TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing methods struggle to achieve high-fidelity, real-time 3D full-body talking avatars on mobile/AR devices, particularly due to bottlenecks in jointly modeling facial micro-expressions and body motion, as well as lightweight rendering. This paper introduces the first mobile-oriented full-body talking avatar framework based on 3D Gaussian Splatting. It leverages a personalized parametric human template and pioneers knowledge distillation of non-rigid deformation into a lightweight MLP, augmented with blend shapes for geometric detail compensation. A StyleUNet-pretrained feature guidance module enables multi-signal co-driving (speech, text, and pose). Evaluated on binocular AR devices such as Apple Vision Pro, the system achieves real-time rendering at 90 FPS. It significantly outperforms state-of-the-art methods in facial expression naturalness and bodily motion coherence, marking the first end-to-end real-time synthesis of high-accuracy full-body avatars on resource-constrained platforms.

Technology Category

Application Category

📝 Abstract

Realistic 3D full-body talking avatars hold great potential in AR, with applications ranging from e-commerce live streaming to holographic communication. Despite advances in 3D Gaussian Splatting (3DGS) for lifelike avatar creation, existing methods struggle with fine-grained control of facial expressions and body movements in full-body talking tasks. Additionally, they often lack sufficient details and cannot run in real-time on mobile devices. We present TaoAvatar, a high-fidelity, lightweight, 3DGS-based full-body talking avatar driven by various signals. Our approach starts by creating a personalized clothed human parametric template that binds Gaussians to represent appearances. We then pre-train a StyleUnet-based network to handle complex pose-dependent non-rigid deformation, which can capture high-frequency appearance details but is too resource-intensive for mobile devices. To overcome this, we"bake"the non-rigid deformations into a lightweight MLP-based network using a distillation technique and develop blend shapes to compensate for details. Extensive experiments show that TaoAvatar achieves state-of-the-art rendering quality while running in real-time across various devices, maintaining 90 FPS on high-definition stereo devices such as the Apple Vision Pro.

Problem

Research questions and friction points this paper is trying to address.

Achieve fine-grained control of facial and body movements in full-body avatars

Enable real-time performance on mobile devices with high detail

Overcome resource-intensive deformation for lifelike avatar rendering

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting for lifelike avatars

StyleUnet-based network for pose deformation

Lightweight MLP network via distillation technique

🔎 Similar Papers

No similar papers found.