🤖 AI Summary
This study addresses the challenge of inaccurate timing in topic transitions during robot-moderated group discussions. We propose the first systematic framework for modeling “topic transition appropriateness.” Methodologically, we integrate multimodal nonverbal cues—including prosodic features, facial expressions, and body gestures—to train hybrid binary classifiers: LSTM-based models for sequential modeling and SVM/Random Forest for non-sequential feature fusion. Key findings reveal that acoustic features alone achieve performance nearly equivalent to full-modality fusion, demonstrating strong robustness. Evaluated on a novel, real-world dataset of robot-moderated group discussions, our model significantly outperforms rule-based baselines in detecting inappropriate transitions. The dataset and code are publicly released. This work advances human-robot collaborative discourse by delivering an interpretable, deployable decision-support system that enhances discussion coherence and participant engagement.
📝 Abstract
Robot-moderated group discussions have the potential to facilitate engaging and productive interactions among human participants. Previous work on topic management in conversational agents has predominantly focused on human engagement and topic personalization, with the agent having an active role in the discussion. Also, studies have shown the usefulness of including robots in groups, yet further exploration is still needed for robots to learn when to change the topic while facilitating discussions. Accordingly, our work investigates the suitability of machine-learning models and audiovisual non-verbal features in predicting appropriate topic changes. We utilized interactions between a robot moderator and human participants, which we annotated and used for extracting acoustic and body language-related features. We provide a detailed analysis of the performance of machine learning approaches using sequential and non-sequential data with different sets of features. The results indicate promising performance in classifying inappropriate topic changes, outperforming rule-based approaches. Additionally, acoustic features exhibited comparable performance and robustness compared to the complete set of multimodal features. Our annotated data is publicly available at https://github.com/ghadj/topic-change-robot-discussions-data-2024.