EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters

📅 2025-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of emotion-controllable audio-driven talking-head video generation, where abstract and ambiguous emotional semantics hinder precise facial expression control. We propose a label-specifiable audio-expression mapping module, integrated with a pretrained hyperplane orthogonal probing mechanism, enabling—for the first time—fine-grained, disentangled modulation of neural radiance fields (NeRFs) via interpretable emotion parameters. Our method jointly models audio and facial expressions while explicitly disentangling semantic emotion parameters, facilitating emotion-aware head animation synthesis within the NeRF rendering framework. Extensive experiments demonstrate significant improvements in expression reconstruction fidelity and emotion consistency across multi-emotion control tasks. Both visual quality and emotion controllability achieve state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Generating emotion-specific talking head videos from audio input is an important and complex challenge for human-machine interaction. However, emotion is highly abstract concept with ambiguous boundaries, and it necessitates disentangled expression parameters to generate emotionally expressive talking head videos. In this work, we present EmoHead to synthesize talking head videos via semantic expression parameters. To predict expression parameter for arbitrary audio input, we apply an audio-expression module that can be specified by an emotion tag. This module aims to enhance correlation from audio input across various emotions. Furthermore, we leverage pre-trained hyperplane to refine facial movements by probing along the vertical direction. Finally, the refined expression parameters regularize neural radiance fields and facilitate the emotion-consistent generation of talking head videos. Experimental results demonstrate that semantic expression parameters lead to better reconstruction quality and controllability.
Problem

Research questions and friction points this paper is trying to address.

Generate emotion-specific talking head videos from audio input
Disentangle expression parameters for emotional video synthesis
Enhance audio-emotion correlation for consistent facial movements
Innovation

Methods, ideas, or system contributions that make the work stand out.

EmoHead uses semantic expression parameters
Audio-expression module enhances emotion correlation
Pre-trained hyperplane refines facial movements
🔎 Similar Papers
No similar papers found.