🤖 AI Summary
This study addresses human–robot interaction for service robots operating in dynamic environments (e.g., hotels) with busy users. It is the first to decouple “attention capture” and “intention conveyance” as two distinct objectives and systematically evaluates the efficacy of three modalities—speech, visual display, and micro-gestures. Behavioral experiments were conducted on the Temi platform, using the MonkeyType typing task to quantify user busyness; stimuli included unimodal and multimodal conditions, with evaluation based on both subjective ratings and objective performance metrics (words-per-minute and accuracy). Results show speech achieves optimal attention capture; visual displays significantly outperform other modalities in intention clarity (p < 0.01); micro-gestures yield the weakest effects; and multimodal fusion fails to surpass the best unimodal performance—demonstrating functional irreplaceability among modalities. Based on these findings, we propose a hierarchical multimodal interaction design principle tailored for busy users, offering both theoretical grounding and practical guidelines for human–robot collaboration in service robotics.
📝 Abstract
The growing use of service robots in hospitality highlights the need to understand how to effectively communicate with pre-occupied customers. This study investigates the efficacy of commonly used communication modalities by service robots, namely, acoustic/speech, visual display, and micromotion gestures in capturing attention and communicating intention with a user in a simulated restaurant scenario. We conducted a two-part user study (N=24) using a Temi robot to simulate delivery tasks, with participants engaged in a typing game (MonkeyType) to emulate a state of busyness. The participants' engagement in the typing game is measured by words per minute (WPM) and typing accuracy. In Part 1, we compared non-verbal acoustic cue versus baseline conditions to assess attention capture during a single-cup delivery task. In Part 2, we evaluated the effectiveness of speech, visual display, micromotion and their multimodal combination in conveying specific intentions (correct cup selection) during a two-cup delivery task. The results indicate that, while speech is highly effective in capturing attention, it is less successful in clearly communicating intention. Participants rated visual as the most effective modality for intention clarity, followed by speech, with micromotion being the lowest ranked.These findings provide insights into optimizing communication strategies for service robots, highlighting the distinct roles of attention capture and intention communication in enhancing user experience in dynamic hospitality settings.