EmoDiffTalk:Emotion-aware Diffusion for Editable 3D Gaussian Talking Head

📅 2025-11-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D Gaussian splatting methods exhibit significant limitations in photo-realistic emotional editing of talking heads, particularly lacking fine-grained, dynamic, and multimodal-coordinated affective control. This paper introduces the first editable 3D talking-head framework. Its core innovations are: (1) an emotion-aware Gaussian diffusion mechanism that embeds Action Units (AUs) as differentiable semantic representations into the Gaussian point cloud optimization; and (2) a text-driven AU prompt diffusion controller enabling end-to-end mapping from natural language to nuanced facial dynamics. Experiments on public benchmarks demonstrate substantial improvements over state-of-the-art methods in emotional expressiveness, lip-sync accuracy (LSE reduced by 18.7%), and multimodal controllability. The framework establishes a new paradigm for high-fidelity, editable, multimodal 3D talking-head synthesis.

Technology Category

Application Category

📝 Abstract
Recent photo-realistic 3D talking head via 3D Gaussian Splatting still has significant shortcoming in emotional expression manipulation, especially for fine-grained and expansive dynamics emotional editing using multi-modal control. This paper introduces a new editable 3D Gaussian talking head, i.e. EmoDiffTalk. Our key idea is a novel Emotion-aware Gaussian Diffusion, which includes an action unit (AU) prompt Gaussian diffusion process for fine-grained facial animator, and moreover an accurate text-to-AU emotion controller to provide accurate and expansive dynamic emotional editing using text input. Experiments on public EmoTalk3D and RenderMe-360 datasets demonstrate superior emotional subtlety, lip-sync fidelity, and controllability of our EmoDiffTalk over previous works, establishing a principled pathway toward high-quality, diffusion-driven, multimodal editable 3D talking-head synthesis. To our best knowledge, our EmoDiffTalk is one of the first few 3D Gaussian Splatting talking-head generation framework, especially supporting continuous, multimodal emotional editing within the AU-based expression space.
Problem

Research questions and friction points this paper is trying to address.

Enhancing emotional expression manipulation in 3D talking heads
Enabling fine-grained and expansive dynamic emotional editing
Providing accurate multimodal control via text-to-AU emotion controller
Innovation

Methods, ideas, or system contributions that make the work stand out.

Emotion-aware Gaussian Diffusion for facial animation
Text-to-AU controller enabling dynamic emotional editing
Multimodal control for continuous 3D talking-head synthesis
🔎 Similar Papers
No similar papers found.
C
Chang Liu
Beijing Normal University
T
Tianjiao Jing
Beijing Normal University
C
Chengcheng Ma
Beijing Normal University
X
Xuanqi Zhou
Beijing Normal University
Z
Zhengxuan Lian
Beijing Normal University
Qin Jin
Qin Jin
中国人民大学信息学院
人工智能
H
Hongliang Yuan
Tencent AI Lab
Shi-Sheng Huang
Shi-Sheng Huang
Associate Professor, Beijing Normal University
Online 3D ReconstructionDynamic View SynthesisvSLAM