EmoDiffTalk:Emotion-aware Diffusion for Editable 3D Gaussian Talking Head

📅 2025-11-30

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Existing 3D Gaussian splatting methods exhibit significant limitations in photo-realistic emotional editing of talking heads, particularly lacking fine-grained, dynamic, and multimodal-coordinated affective control. This paper introduces the first editable 3D talking-head framework. Its core innovations are: (1) an emotion-aware Gaussian diffusion mechanism that embeds Action Units (AUs) as differentiable semantic representations into the Gaussian point cloud optimization; and (2) a text-driven AU prompt diffusion controller enabling end-to-end mapping from natural language to nuanced facial dynamics. Experiments on public benchmarks demonstrate substantial improvements over state-of-the-art methods in emotional expressiveness, lip-sync accuracy (LSE reduced by 18.7%), and multimodal controllability. The framework establishes a new paradigm for high-fidelity, editable, multimodal 3D talking-head synthesis.

Technology Category

Application Category

📝 Abstract

Recent photo-realistic 3D talking head via 3D Gaussian Splatting still has significant shortcoming in emotional expression manipulation, especially for fine-grained and expansive dynamics emotional editing using multi-modal control. This paper introduces a new editable 3D Gaussian talking head, i.e. EmoDiffTalk. Our key idea is a novel Emotion-aware Gaussian Diffusion, which includes an action unit (AU) prompt Gaussian diffusion process for fine-grained facial animator, and moreover an accurate text-to-AU emotion controller to provide accurate and expansive dynamic emotional editing using text input. Experiments on public EmoTalk3D and RenderMe-360 datasets demonstrate superior emotional subtlety, lip-sync fidelity, and controllability of our EmoDiffTalk over previous works, establishing a principled pathway toward high-quality, diffusion-driven, multimodal editable 3D talking-head synthesis. To our best knowledge, our EmoDiffTalk is one of the first few 3D Gaussian Splatting talking-head generation framework, especially supporting continuous, multimodal emotional editing within the AU-based expression space.

Problem

Research questions and friction points this paper is trying to address.

Enhancing emotional expression manipulation in 3D talking heads

Enabling fine-grained and expansive dynamic emotional editing

Providing accurate multimodal control via text-to-AU emotion controller

Innovation

Methods, ideas, or system contributions that make the work stand out.

Emotion-aware Gaussian Diffusion for facial animation

Text-to-AU controller enabling dynamic emotional editing

Multimodal control for continuous 3D talking-head synthesis

🔎 Similar Papers

No similar papers found.