Text-Driven Emotionally Continuous Talking Face Generation

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing talking-face generation methods, which often produce faces with static, single-emotion expressions and fail to model natural, continuous emotional dynamics. To overcome this, we introduce a novel task of emotionally coherent talking-face synthesis, driven jointly by input text and dynamic emotion descriptors to generate realistic videos where facial expressions align precisely with both speech content and nuanced affective states. We propose the first temporally dense emotion fluctuation modeling mechanism and develop a tailored TIE-TFG architecture incorporating temporal emotion modulation, ensuring fine-grained synchronization between facial dynamics and textual semantics. Experimental results demonstrate that our approach consistently yields high-quality videos with smooth emotional transitions and lifelike facial movements across diverse affective conditions.

Technology Category

Application Category

📝 Abstract
Talking Face Generation (TFG) strives to create realistic and emotionally expressive digital faces. While previous TFG works have mastered the creation of naturalistic facial movements, they typically express a fixed target emotion in synthetic videos and lack the ability to exhibit continuously changing and natural expressions like humans do when conveying information. To synthesize realistic videos, we propose a novel task called Emotionally Continuous Talking Face Generation (EC-TFG), which takes a text segment and an emotion description with varying emotions as driving data, aiming to generate a video where the person speaks the text while reflecting the emotional changes within the description. Alongside this, we introduce a customized model, i.e., Temporal-Intensive Emotion Modulated Talking Face Generation (TIE-TFG), which innovatively manages dynamic emotional variations by employing Temporal-Intensive Emotion Fluctuation Modeling, allowing it to provide emotion variation sequences corresponding to the input text to drive continuous facial expression changes in synthesized videos. Extensive evaluations demonstrate our method's exceptional ability to produce smooth emotion transitions and uphold high-quality visuals and motion authenticity across diverse emotional states.
Problem

Research questions and friction points this paper is trying to address.

Talking Face Generation
Emotionally Continuous
Facial Expression
Emotion Variation
Text-Driven
Innovation

Methods, ideas, or system contributions that make the work stand out.

Emotionally Continuous Talking Face Generation
Temporal-Intensive Emotion Fluctuation Modeling
Dynamic Emotional Variation
Talking Face Generation
Emotion-Aware Facial Animation
🔎 Similar Papers
No similar papers found.
H
Hao Yang
Harbin Institute of Technology
Yanyan Zhao
Yanyan Zhao
Harbin Institute of Technology
Natural Language Processing
T
Tian Zheng
Harbin Institute of Technology
H
Hongbo Zhang
Harbin Institute of Technology
B
Bichen Wang
Harbin Institute of Technology
D
Di Wu
Xing Fu
Xing Fu
Ant Group
X
Xuda Zhi
SERES
Y
Yongbo Huang
SERES
H
Hao He
SERES