Latent Behavior Diffusion for Sequential Reaction Generation in Dyadic Setting

📅 2025-05-12

🏛️ International Conference on Pattern Recognition

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address the lack of diversity and contextual consistency in responsive facial motion generation for dyadic conversations, this paper proposes the first latent-space behavioral diffusion framework. The method comprises a context-aware multimodal autoencoder and a non-autoregressive, latent-conditioned diffusion generator. The autoencoder maps multimodal conversational context—including speech, text, and interlocutor facial motions—into compact, behaviorally meaningful latent representations. The diffusion generator then models long-range temporal dependencies efficiently within this latent space, enabling high-fidelity, diverse, and socially coordinated facial response synthesis. Evaluated on dyadic responsive motion generation, our approach significantly outperforms existing state-of-the-art methods, achieving measurable improvements in motion realism, emotional expressiveness, and dynamism richness. This work establishes a scalable, diffusion-based generative paradigm for natural human–computer interaction.

Technology Category

Application Category

📝 Abstract

The dyadic reaction generation task involves synthesizing responsive facial reactions that align closely with the behaviors of a conversational partner, enhancing the naturalness and effectiveness of human-like interaction simulations. This paper introduces a novel approach, the Latent Behavior Diffusion Model, comprising a context-aware autoencoder and a diffusion-based conditional generator that addresses the challenge of generating diverse and contextually relevant facial reactions from input speaker behaviors. The autoencoder compresses high-dimensional input features, capturing dynamic patterns in listener reactions while condensing complex input data into a concise latent representation, facilitating more expressive and contextually appropriate reaction synthesis. The diffusion-based conditional generator operates on the latent space generated by the autoencoder to predict realistic facial reactions in a non-autoregressive manner. This approach allows for generating diverse facial reactions that reflect subtle variations in conversational cues and emotional states. Experimental results demonstrate the effectiveness of our approach in achieving superior performance in dyadic reaction synthesis tasks compared to existing methods.

Problem

Research questions and friction points this paper is trying to address.

Generating diverse facial reactions from speaker behaviors

Enhancing naturalness in dyadic interaction simulations

Compressing high-dimensional input for realistic reaction synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Behavior Diffusion Model for diverse reactions

Context-aware autoencoder compresses high-dimensional features

Diffusion-based generator predicts realistic facial reactions

🔎 Similar Papers

When LLMs Play the Telephone Game: Cumulative Changes and Attractors in Iterated Cultural Transmissions