🤖 AI Summary
This study addresses the limitation of existing dialogue emotion modeling approaches, which predominantly focus on individual speakers and fail to capture the dynamic coupling inherent in dyadic interactions. To bridge this gap, we introduce Hume-DaiKon, the first multilingual multimodal dataset specifically designed for dyadic conversations, and establish a unified benchmark comprising three subtasks: directional interpersonal influence prediction, turn-taking prediction, and continuous rapport trajectory forecasting. Employing multimodal fusion and temporal modeling techniques, we evaluate baseline models under standard splits using Concordance Correlation Coefficient (CCC), Pearson correlation, Macro-F1, and Mean Absolute Error (MAE). Initial results show moderate performance in influence prediction (CCC=0.40), turn-taking prediction (Macro-F1=0.66, MAE=1.50 seconds), and rapport trajectory modeling (CCC=0.68), highlighting both the challenges and promise of bidirectional dynamic modeling in interpersonal dialogue.
📝 Abstract
The 2026 ACII Dyadic Conversations (ACII-DaiKon) Workshop & Challenge introduces a benchmark for modeling interpersonal affect and social dynamics in dyadic conversations. Although conversational affect modeling has advanced rapidly, most benchmarks remain speaker-centric and underrepresent coupled, time-evolving processes between partners, including directional influence, conversational timing coordination, and rapport development. To address this gap, ACII-DaiKon presents three coordinated sub-challenges built on a shared dataset: (1) directional interpersonal influence prediction, (2) turn-taking prediction (next-speaker and time-to-next-speech), and (3) rapport trajectory prediction across full interactions.
The challenge is built on the Hume-DaiKon dataset, comprising 945 dyadic conversations (743.4 hours of audiovisual data) collected under naturalistic conditions across five languages. The benchmark supports multimodal modeling, temporal reasoning, and cross-context generalization through fixed train/validation/test splits, standardized metrics, and released baseline systems. Evaluation uses Concordance Correlation Coefficient (CCC), Pearson correlation, Macro-F1, and Mean Absolute Error (MAE) depending on the sub-challenge.
Baseline experiments establish initial reference performance, with best test results of 0.40 CCC and 0.50 Pearson for influence prediction, 0.66 Macro-F1 and 1.50~s MAE for turn-taking, and 0.68 CCC and 0.70 Pearson for rapport trajectory modeling. These results indicate that while current methods capture coarse dyadic patterns, robust modeling of directional dependence and long-horizon interpersonal dynamics remains challenging. The workshop provides a shared platform for rigorous comparison and cross-disciplinary discussion on data validity, evaluation protocols, and culturally aware modeling for dyadic interaction.