Latent Behavior Diffusion for Sequential Reaction Generation in Dyadic Setting

📅 2025-05-12
🏛️ International Conference on Pattern Recognition
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of diversity and contextual consistency in responsive facial motion generation for dyadic conversations, this paper proposes the first latent-space behavioral diffusion framework. The method comprises a context-aware multimodal autoencoder and a non-autoregressive, latent-conditioned diffusion generator. The autoencoder maps multimodal conversational context—including speech, text, and interlocutor facial motions—into compact, behaviorally meaningful latent representations. The diffusion generator then models long-range temporal dependencies efficiently within this latent space, enabling high-fidelity, diverse, and socially coordinated facial response synthesis. Evaluated on dyadic responsive motion generation, our approach significantly outperforms existing state-of-the-art methods, achieving measurable improvements in motion realism, emotional expressiveness, and dynamism richness. This work establishes a scalable, diffusion-based generative paradigm for natural human–computer interaction.

Technology Category

Application Category

📝 Abstract
The dyadic reaction generation task involves synthesizing responsive facial reactions that align closely with the behaviors of a conversational partner, enhancing the naturalness and effectiveness of human-like interaction simulations. This paper introduces a novel approach, the Latent Behavior Diffusion Model, comprising a context-aware autoencoder and a diffusion-based conditional generator that addresses the challenge of generating diverse and contextually relevant facial reactions from input speaker behaviors. The autoencoder compresses high-dimensional input features, capturing dynamic patterns in listener reactions while condensing complex input data into a concise latent representation, facilitating more expressive and contextually appropriate reaction synthesis. The diffusion-based conditional generator operates on the latent space generated by the autoencoder to predict realistic facial reactions in a non-autoregressive manner. This approach allows for generating diverse facial reactions that reflect subtle variations in conversational cues and emotional states. Experimental results demonstrate the effectiveness of our approach in achieving superior performance in dyadic reaction synthesis tasks compared to existing methods.
Problem

Research questions and friction points this paper is trying to address.

Generating diverse facial reactions from speaker behaviors
Enhancing naturalness in dyadic interaction simulations
Compressing high-dimensional input for realistic reaction synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Behavior Diffusion Model for diverse reactions
Context-aware autoencoder compresses high-dimensional features
Diffusion-based generator predicts realistic facial reactions
🔎 Similar Papers
No similar papers found.
Minh-Duc Nguyen
Minh-Duc Nguyen
CECS, VinUniversity
AI AgentLLMOptimization
H
Hyung-Jeong Yang
Chonnam National University, Gwangju, South Korea
S
Sooyoung Kim
Chonnam National University, Gwangju, South Korea
Ji-eun Shin
Ji-eun Shin
Chonnam National University, Gwangju, South Korea
S
Seung-won Kim
Chonnam National University, Gwangju, South Korea