🤖 AI Summary
To address safety and collaboration bottlenecks arising from poor intent interpretability and unnatural interaction of non-anthropomorphic industrial robots (e.g., forklifts) in human environments, this paper introduces the “agent-based intent translation” paradigm. It employs a small anthropomorphic proxy robot as an interaction intermediary, integrating large language model (LLM)-driven semantic understanding with biological motion cue modeling to realize multimodal, embodied intent expression via speech, gaze, and anthropomorphic motion. The approach comprises multimodal perception (eye tracking/motion capture), LLM-augmented intent generation, anthropomorphic motion planning, and human-factor-driven evaluation. Experiments demonstrate a 37% improvement in user intent recognition accuracy, a 29% reduction in task response latency, and a 42% decrease in user distraction behaviors. This work provides the first systematic validation of the efficacy and practicality of synergistically combining LLMs with biological motion modeling to enhance human-robot interaction (HRI) for non-anthropomorphic robots.
📝 Abstract
Successful adoption of industrial robots will strongly depend on their ability to safely and efficiently operate in human environments, engage in natural communication, understand their users, and express intentions intuitively while avoiding unnecessary distractions. To achieve this advanced level of Human-Robot Interaction (HRI), robots need to acquire and incorporate knowledge of their users' tasks and environment and adopt multimodal communication approaches with expressive cues that combine speech, movement, gazes, and other modalities. This paper presents several methods to design, enhance, and evaluate expressive HRI systems for non-humanoid industrial robots. We present the concept of a small anthropomorphic robot communicating as a proxy for its non-humanoid host, such as a forklift. We developed a multimodal and LLM-enhanced communication framework for this robot and evaluated it in several lab experiments, using gaze tracking and motion capture to quantify how users perceive the robot and measure the task progress.