🤖 AI Summary
Existing HRI research relies heavily on self-report questionnaires to assess user enjoyment, which fails to capture the dynamic, turn-by-turn nature of human–machine dialogue. To address this limitation, we propose the first externally observable enjoyment assessment scale specifically designed for human–machine conversation, supporting both turn-level and holistic fine-grained annotation. Methodologically, we establish a behavior–semantics coding framework grounded in third-party observation, design a multi-expert collaborative annotation protocol, and apply it to 174 minutes of authentic dialogues between 25 older adults and a large-language-model–driven social robot. Inter-annotator agreement is validated via Cohen’s Kappa (κ = 0.52–0.78). This scale overcomes the constraints of subjective self-reporting, enables real-time enjoyment recognition, and is accompanied by an open-source annotated dataset—establishing a novel benchmark for dynamic enjoyment modeling and empathic human–robot interaction.
📝 Abstract
Understanding user enjoyment is crucial in human-robot interaction (HRI), as it can impact interaction quality and influence user acceptance and long-term engagement with robots, particularly in the context of conversations with social robots. However, current assessment methods rely solely on self-reported questionnaires, failing to capture interaction dynamics. This work introduces the Human-Robot Interaction Conversational User Enjoyment Scale (HRI CUES), a novel scale for assessing user enjoyment from an external perspective during conversations with a robot. Developed through rigorous evaluations and discussions of three annotators with relevant expertise, the scale provides a structured framework for assessing enjoyment in each conversation exchange (turn) alongside overall interaction levels. It aims to complement self-reported enjoyment from users and holds the potential for autonomously identifying user enjoyment in real-time HRI. The scale was validated on 25 older adults' open-domain dialogue with a companion robot that was powered by a large language model for conversations, corresponding to 174 minutes of data, showing moderate to good alignment. The dataset is available online. Additionally, the study offers insights into understanding the nuances and challenges of assessing user enjoyment in robot interactions, and provides guidelines on applying the scale to other domains.