🤖 AI Summary
This study addresses the automatic detection of Other-Initiated Repair (OIR) requests in human–computer dialogue, aiming to enhance dialogue agents’ ability to recognize signals of comprehension failure and thereby prevent dialogue breakdown. Methodologically, it pioneers the integration of conversation analysis theory with multimodal modeling for Dutch-language interactions, jointly leveraging pretrained textual embeddings, linguistic features, and prosodic features. Its key contribution lies in uncovering the complementary roles of linguistic and prosodic cues in OIR detection. Experimental results demonstrate that incorporating prosodic features yields substantial performance gains over text-only baselines. The proposed framework thus provides a scalable, multimodal technical pathway for modeling repair behavior across languages and interaction scenarios.
📝 Abstract
Maintaining mutual understanding is a key component in human-human conversation to avoid conversation breakdowns, in which repair, particularly Other-Initiated Repair (OIR, when one speaker signals trouble and prompts the other to resolve), plays a vital role. However, Conversational Agents (CAs) still fail to recognize user repair initiation, leading to breakdowns or disengagement. This work proposes a multimodal model to automatically detect repair initiation in Dutch dialogues by integrating linguistic and prosodic features grounded in Conversation Analysis. The results show that prosodic cues complement linguistic features and significantly improve the results of pretrained text and audio embeddings, offering insights into how different features interact. Future directions include incorporating visual cues, exploring multilingual and cross-context corpora to assess the robustness and generalizability.