EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters

📅 2025-03-25

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the challenge of emotion-controllable audio-driven talking-head video generation, where abstract and ambiguous emotional semantics hinder precise facial expression control. We propose a label-specifiable audio-expression mapping module, integrated with a pretrained hyperplane orthogonal probing mechanism, enabling—for the first time—fine-grained, disentangled modulation of neural radiance fields (NeRFs) via interpretable emotion parameters. Our method jointly models audio and facial expressions while explicitly disentangling semantic emotion parameters, facilitating emotion-aware head animation synthesis within the NeRF rendering framework. Extensive experiments demonstrate significant improvements in expression reconstruction fidelity and emotion consistency across multi-emotion control tasks. Both visual quality and emotion controllability achieve state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

Generating emotion-specific talking head videos from audio input is an important and complex challenge for human-machine interaction. However, emotion is highly abstract concept with ambiguous boundaries, and it necessitates disentangled expression parameters to generate emotionally expressive talking head videos. In this work, we present EmoHead to synthesize talking head videos via semantic expression parameters. To predict expression parameter for arbitrary audio input, we apply an audio-expression module that can be specified by an emotion tag. This module aims to enhance correlation from audio input across various emotions. Furthermore, we leverage pre-trained hyperplane to refine facial movements by probing along the vertical direction. Finally, the refined expression parameters regularize neural radiance fields and facilitate the emotion-consistent generation of talking head videos. Experimental results demonstrate that semantic expression parameters lead to better reconstruction quality and controllability.

Problem

Research questions and friction points this paper is trying to address.

Generate emotion-specific talking head videos from audio input

Disentangle expression parameters for emotional video synthesis

Enhance audio-emotion correlation for consistent facial movements

Innovation

Methods, ideas, or system contributions that make the work stand out.

EmoHead uses semantic expression parameters

Audio-expression module enhances emotion correlation

Pre-trained hyperplane refines facial movements

🔎 Similar Papers

EmoVOCA: Speech-Driven Emotional 3D Talking Heads

2024-03-19IEEE Workshop/Winter Conference on Applications of Computer VisionCitations: 4

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation

2024-08-12arXiv.orgCitations: 0