Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing speech-driven 3D talking-head methods primarily focus on geometric motion modeling, neglecting the critical impact of dynamic texture on visual fidelity. This paper proposes TexTalker—the first diffusion-based framework for joint generation of speech-driven geometry and dynamic texture—and introduces TexTalk4D, the first high-resolution 4D dynamic texture dataset (100 subjects, 100 minutes, 8K texture maps). We innovatively design a pivot-point-based style injection mechanism to enable disentangled, personalized control over motion and texture appearance. Quantitative and qualitative evaluations demonstrate that TexTalker consistently outperforms state-of-the-art methods in facial motion accuracy, texture photorealism, lip-sync precision, and spatiotemporal detail consistency. It enables synthesis of high-fidelity, 8K-resolution dynamic-texture 3D avatars with fine-grained temporal coherence.

Technology Category

Application Category

📝 Abstract
Significant progress has been made for speech-driven 3D face animation, but most works focus on learning the motion of mesh/geometry, ignoring the impact of dynamic texture. In this work, we reveal that dynamic texture plays a key role in rendering high-fidelity talking avatars, and introduce a high-resolution 4D dataset extbf{TexTalk4D}, consisting of 100 minutes of audio-synced scan-level meshes with detailed 8K dynamic textures from 100 subjects. Based on the dataset, we explore the inherent correlation between motion and texture, and propose a diffusion-based framework extbf{TexTalker} to simultaneously generate facial motions and dynamic textures from speech. Furthermore, we propose a novel pivot-based style injection strategy to capture the complicity of different texture and motion styles, which allows disentangled control. TexTalker, as the first method to generate audio-synced facial motion with dynamic texture, not only outperforms the prior arts in synthesising facial motions, but also produces realistic textures that are consistent with the underlying facial movements. Project page: https://xuanchenli.github.io/TexTalk/.
Problem

Research questions and friction points this paper is trying to address.

Develop high-fidelity 3D talking avatars with dynamic textures.
Explore correlation between facial motion and dynamic texture.
Generate synchronized facial motion and texture from speech.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces TexTalk4D dataset with 8K textures
Proposes TexTalker for motion and texture generation
Uses pivot-based style injection for disentangled control
🔎 Similar Papers
No similar papers found.
Xuanchen Li
Xuanchen Li
Shanghai Jiao Tong University
Digital HumanHumanoid RobotAIGCComputer VisionImage Restoration
J
Jianyu Wang
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University
Y
Yuhao Cheng
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University
Y
Yikun Zeng
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University
Xingyu Ren
Xingyu Ren
Ph.D. graduate, Shanghai Jiao Tong University
Face ModelingGenerative AI
W
Wenhan Zhu
Xueshen AI
Weiming Zhao
Weiming Zhao
Student Innovation Center, Shanghai Jiao Tong University
Y
Yichao Yan
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University