REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of generating natural, multimodal listener facial reactions—diverse, contextually appropriate, photorealistic, and temporally synchronized to speaker audiovisual behavior—in spontaneous two-party conversations. Methodologically, we introduce MARS, the first large-scale multimodal listener reaction dataset (137 real dialogues, 2,856 utterances), and formulate two complementary tasks: offline generation and online streaming generation. Our approach jointly models audio, visual, and semantic features, integrating generative frameworks (VAE, GAN, diffusion models), temporal alignment techniques, and diversity-enforcing constraints. Key contributions include: (1) public release of the MARS dataset and a unified evaluation benchmark; (2) open-sourcing of baseline implementations; and (3) quantitative evaluation across three dimensions—reaction quality, contextual appropriateness, and audiovisual synchronization. Our framework advances listener reaction generation toward greater realism and practical deployability.

Technology Category

Application Category

📝 Abstract
In dyadic interactions, a broad spectrum of human facial reactions might be appropriate for responding to each human speaker behaviour. Following the successful organisation of the REACT 2023 and REACT 2024 challenges, we are proposing the REACT 2025 challenge encouraging the development and benchmarking of Machine Learning (ML) models that can be used to generate multiple appropriate, diverse, realistic and synchronised human-style facial reactions expressed by human listeners in response to an input stimulus (i.e., audio-visual behaviours expressed by their corresponding speakers). As a key of the challenge, we provide challenge participants with the first natural and large-scale multi-modal MAFRG dataset (called MARS) recording 137 human-human dyadic interactions containing a total of 2856 interaction sessions covering five different topics. In addition, this paper also presents the challenge guidelines and the performance of our baselines on the two proposed sub-challenges: Offline MAFRG and Online MAFRG, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline_react2025
Problem

Research questions and friction points this paper is trying to address.

Generate diverse realistic facial reactions to speaker behaviors
Develop ML models for synchronized listener facial expressions
Benchmark models using large-scale multi-modal dyadic interaction data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine Learning models for facial reactions
Large-scale multi-modal MAFRG dataset
Offline and Online MAFRG sub-challenges
🔎 Similar Papers
No similar papers found.
S
Siyang Song
University of Exeter, United Kingdom
Micol Spitale
Micol Spitale
Politecnico di Milano, Department of Electronics, Information, and Bioengineering
Human-Robot InteractionSocially Assistive RoboticsSocial Artificial Intelligence
X
Xiangyu Kong
University of Exeter, United Kingdom
H
Hengde Zhu
University of Leicester, United Kingdom
C
Cheng Luo
King Abdullah University of Science and Technology, Saudi Arabia
Cristina Palmero
Cristina Palmero
Royal Academy Of Engineering Research Fellow, King's College London
Gaze estimationHuman Behavior AnalysisComputer VisionMachine Learning
G
Germán Barquero
Universitat de Barcelona, Barcelona, Spain
Sergio Escalera
Sergio Escalera
Prof., ICREA Academy, University of Barcelona, Computer Vision Center, ELLIS & IAPR & AAIA Fellow
Human Behavior AnalysisMachine LearningComputer VisionAffective ComputingSocial Signal Processing
M
Michel F. Valstar
University of Nottingham, Nottingham, United Kingdom
M
Mohamed Daoudi
IMT Nord Europe, Villeneuve d’Ascq, France
Tobias Baur
Tobias Baur
Fsas Technologies, Human-Centered Artificial Intelligence, University of Augsburg
Affective ComputingSocial Signal ProcessingExplainable AIInteractive Machine LearningVirtual
F
F. Ringeval
Andrew Howes
Andrew Howes
University of Exeter
Cognitive ScienceComputational InteractionHuman-Computer Interaction
Elisabeth Andre
Elisabeth Andre
Professor of Computer Sciences, Augsburg University
Intelligent User InterfacesAffective ComputingSocial RoboticsVirtual HumansSocial Signal Processing
Hatice Gunes
Hatice Gunes
Full Professor of Affective Intelligence & Robotics, University of Cambridge
Artificial IntelligenceAffective AIHealth AIAI FairnessSocially Assistive Robotics