ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations

📅 2025-07-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language model (LLM)-driven conversational agents frequently exhibit failure behaviors in human-robot interaction (HRI), including intent misinterpretation, abnormal interruptions, and unresponsiveness. Method: We propose a multimodal failure detection framework grounded in the first 16-hour dyadic interaction dataset specifically designed for dialogue failure analysis, incorporating facial expressions, speech, and head motion modalities. The dataset features dual fine-grained annotations—system error types and user correction intent—overcoming limitations of conventional unimodal approaches. Our method employs temporal fusion modeling and a dedicated classification architecture for precise failure identification. Contribution/Results: This work pioneers the systematic integration of social signal analysis into HRI failure detection. It achieves significant improvements in accuracy and false positive rate over baseline methods, establishing a reproducible data foundation and methodological framework to enhance dialogue coherence and user trust.

Technology Category

Application Category

📝 Abstract
The integration of large language models (LLMs) into conversational robots has made human-robot conversations more dynamic. Yet, LLM-powered conversational robots remain prone to errors, e.g., misunderstanding user intent, prematurely interrupting users, or failing to respond altogether. Detecting and addressing these failures is critical for preventing conversational breakdowns, avoiding task disruptions, and sustaining user trust. To tackle this problem, the ERR@HRI 2.0 Challenge provides a multimodal dataset of LLM-powered conversational robot failures during human-robot conversations and encourages researchers to benchmark machine learning models designed to detect robot failures. The dataset includes 16 hours of dyadic human-robot interactions, incorporating facial, speech, and head movement features. Each interaction is annotated with the presence or absence of robot errors from the system perspective, and perceived user intention to correct for a mismatch between robot behavior and user expectation. Participants are invited to form teams and develop machine learning models that detect these failures using multimodal data. Submissions will be evaluated using various performance metrics, including detection accuracy and false positive rate. This challenge represents another key step toward improving failure detection in human-robot interaction through social signal analysis.
Problem

Research questions and friction points this paper is trying to address.

Detect errors in LLM-powered human-robot conversations
Identify multimodal signals of robot failures
Improve failure detection using social signal analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal dataset for robot failure detection
Machine learning models using social signals
Annotation of robot errors and user intentions
🔎 Similar Papers
No similar papers found.