🤖 AI Summary
Existing 3D Gaussian splatting methods exhibit significant limitations in photo-realistic emotional editing of talking heads, particularly lacking fine-grained, dynamic, and multimodal-coordinated affective control. This paper introduces the first editable 3D talking-head framework. Its core innovations are: (1) an emotion-aware Gaussian diffusion mechanism that embeds Action Units (AUs) as differentiable semantic representations into the Gaussian point cloud optimization; and (2) a text-driven AU prompt diffusion controller enabling end-to-end mapping from natural language to nuanced facial dynamics. Experiments on public benchmarks demonstrate substantial improvements over state-of-the-art methods in emotional expressiveness, lip-sync accuracy (LSE reduced by 18.7%), and multimodal controllability. The framework establishes a new paradigm for high-fidelity, editable, multimodal 3D talking-head synthesis.
📝 Abstract
Recent photo-realistic 3D talking head via 3D Gaussian Splatting still has significant shortcoming in emotional expression manipulation, especially for fine-grained and expansive dynamics emotional editing using multi-modal control. This paper introduces a new editable 3D Gaussian talking head, i.e. EmoDiffTalk. Our key idea is a novel Emotion-aware Gaussian Diffusion, which includes an action unit (AU) prompt Gaussian diffusion process for fine-grained facial animator, and moreover an accurate text-to-AU emotion controller to provide accurate and expansive dynamic emotional editing using text input. Experiments on public EmoTalk3D and RenderMe-360 datasets demonstrate superior emotional subtlety, lip-sync fidelity, and controllability of our EmoDiffTalk over previous works, establishing a principled pathway toward high-quality, diffusion-driven, multimodal editable 3D talking-head synthesis. To our best knowledge, our EmoDiffTalk is one of the first few 3D Gaussian Splatting talking-head generation framework, especially supporting continuous, multimodal emotional editing within the AU-based expression space.