EmoTaG: Emotion-Aware Talking Head Synthesis on Gaussian Splatting with Few-Shot Personalization

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Existing few-shot 3D talking head synthesis methods often suffer from geometric instability and mismatch between audio and facial expressions when generating rich emotional dynamics. To address these limitations, this work proposes a pretrain-and-adapt framework that explicitly models facial motion in the FLAME parameter space to incorporate geometric priors. A Gated Residual Motion Network (GRMN) is introduced to effectively fuse prosodic emotional cues from audio with head pose and upper-face motion signals. By integrating 3D Gaussian splatting with emotion-aware audio driving, the proposed approach significantly enhances expression coherence, lip-sync accuracy, visual realism, and motion stability under few-shot settings, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

Audio-driven 3D talking head synthesis has advanced rapidly with Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). By leveraging rich pre-trained priors, few-shot methods enable instant personalization from just a few seconds of video. However, under expressive facial motion, existing few-shot approaches often suffer from geometric instability and audio-emotion mismatch, highlighting the need for more effective emotion-aware motion modeling. In this work, we present EmoTaG, a few-shot emotion-aware 3D talking head synthesis framework built on the Pretrain-and-Adapt paradigm. Our key insight is to reformulate motion prediction in a structured FLAME parameter space rather than directly deforming 3D Gaussians, thereby introducing explicit geometric priors that improve motion stability. Building upon this, we propose a Gated Residual Motion Network (GRMN), which captures emotional prosody from audio while supplementing head pose and upper-face cues absent from audio, enabling expressive and coherent motion generation. Extensive experiments demonstrate that EmoTaG achieves state-of-the-art performance in emotional expressiveness, lip synchronization, visual realism, and motion stability.

Problem

Research questions and friction points this paper is trying to address.

talking head synthesis

few-shot personalization

emotion-aware

geometric instability

audio-emotion mismatch

Innovation

Methods, ideas, or system contributions that make the work stand out.

emotion-aware synthesis

3D Gaussian Splatting

FLAME parameter space