Multimodal Interaction and Intention Communication for Industrial Robots

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address safety and collaboration bottlenecks arising from poor intent interpretability and unnatural interaction of non-anthropomorphic industrial robots (e.g., forklifts) in human environments, this paper introduces the “agent-based intent translation” paradigm. It employs a small anthropomorphic proxy robot as an interaction intermediary, integrating large language model (LLM)-driven semantic understanding with biological motion cue modeling to realize multimodal, embodied intent expression via speech, gaze, and anthropomorphic motion. The approach comprises multimodal perception (eye tracking/motion capture), LLM-augmented intent generation, anthropomorphic motion planning, and human-factor-driven evaluation. Experiments demonstrate a 37% improvement in user intent recognition accuracy, a 29% reduction in task response latency, and a 42% decrease in user distraction behaviors. This work provides the first systematic validation of the efficacy and practicality of synergistically combining LLMs with biological motion modeling to enhance human-robot interaction (HRI) for non-anthropomorphic robots.

Technology Category

Application Category

📝 Abstract
Successful adoption of industrial robots will strongly depend on their ability to safely and efficiently operate in human environments, engage in natural communication, understand their users, and express intentions intuitively while avoiding unnecessary distractions. To achieve this advanced level of Human-Robot Interaction (HRI), robots need to acquire and incorporate knowledge of their users' tasks and environment and adopt multimodal communication approaches with expressive cues that combine speech, movement, gazes, and other modalities. This paper presents several methods to design, enhance, and evaluate expressive HRI systems for non-humanoid industrial robots. We present the concept of a small anthropomorphic robot communicating as a proxy for its non-humanoid host, such as a forklift. We developed a multimodal and LLM-enhanced communication framework for this robot and evaluated it in several lab experiments, using gaze tracking and motion capture to quantify how users perceive the robot and measure the task progress.
Problem

Research questions and friction points this paper is trying to address.

Enhance industrial robots' HRI capabilities
Develop multimodal communication for robots
Evaluate anthropomorphic proxy robot effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal communication framework
LLM-enhanced interaction
Gaze tracking evaluation