REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the problem of generating natural, multimodal listener facial reactions—diverse, contextually appropriate, photorealistic, and temporally synchronized to speaker audiovisual behavior—in spontaneous two-party conversations. Methodologically, we introduce MARS, the first large-scale multimodal listener reaction dataset (137 real dialogues, 2,856 utterances), and formulate two complementary tasks: offline generation and online streaming generation. Our approach jointly models audio, visual, and semantic features, integrating generative frameworks (VAE, GAN, diffusion models), temporal alignment techniques, and diversity-enforcing constraints. Key contributions include: (1) public release of the MARS dataset and a unified evaluation benchmark; (2) open-sourcing of baseline implementations; and (3) quantitative evaluation across three dimensions—reaction quality, contextual appropriateness, and audiovisual synchronization. Our framework advances listener reaction generation toward greater realism and practical deployability.

Technology Category

Application Category

📝 Abstract

In dyadic interactions, a broad spectrum of human facial reactions might be appropriate for responding to each human speaker behaviour. Following the successful organisation of the REACT 2023 and REACT 2024 challenges, we are proposing the REACT 2025 challenge encouraging the development and benchmarking of Machine Learning (ML) models that can be used to generate multiple appropriate, diverse, realistic and synchronised human-style facial reactions expressed by human listeners in response to an input stimulus (i.e., audio-visual behaviours expressed by their corresponding speakers). As a key of the challenge, we provide challenge participants with the first natural and large-scale multi-modal MAFRG dataset (called MARS) recording 137 human-human dyadic interactions containing a total of 2856 interaction sessions covering five different topics. In addition, this paper also presents the challenge guidelines and the performance of our baselines on the two proposed sub-challenges: Offline MAFRG and Online MAFRG, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline_react2025

Problem

Research questions and friction points this paper is trying to address.

Generate diverse realistic facial reactions to speaker behaviors

Develop ML models for synchronized listener facial expressions

Benchmark models using large-scale multi-modal dyadic interaction data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine Learning models for facial reactions

Large-scale multi-modal MAFRG dataset

Offline and Online MAFRG sub-challenges

🔎 Similar Papers

No similar papers found.

Microsoft

$6,710 -

San Francisco Bay area, USA / New York City metropolitan area, USA

Research Scientist Intern, Multimodal AI (PhD)