🤖 AI Summary
This study investigates natural interaction mechanisms between users and robotic arms during collaborative assembly in virtual reality (VR), focusing on behavioral differences under instructive versus collaborative spatial contexts. Using a VR simulation combined with the Wizard-of-Oz paradigm, we simultaneously collected multimodal data—including speech, hand gestures, and eye movements. Results show that users significantly prefer deictic “put-that-there” utterances in spatially ambiguous contexts, shifting to descriptive language when spatial relations are unambiguous—providing the first empirical evidence of strong contextual dependence of linguistic strategies on spatial grounding. Based on these findings, we propose a spatially aware language-use model and release the first open multimodal interaction dataset for VR-based robotic assembly. The work advances theoretical understanding of context-sensitive human–robot collaboration and provides empirical foundations and benchmark data for designing adaptive, natural human–robot interfaces in immersive environments.
📝 Abstract
We explore natural user interactions using a virtual reality simulation of a robot arm for assembly tasks. Using a Wizard-of-Oz study, participants completed collaborative LEGO and instructive PCB assembly tasks, with the robot responding under experimenter control. We collected voice, hand tracking, and gaze data from users. Statistical analyses revealed that instructive and collaborative scenarios elicit distinct behaviors and adopted strategies, particularly as tasks progress. Users tended to use put-that-there language in spatially ambiguous contexts and more descriptive instructions in spatially clear ones. Our contributions include the identification of natural interaction strategies through analyses of collected data, as well as the supporting dataset, to guide the understanding and design of natural multimodal user interfaces for instructive interaction with systems in virtual reality.