🤖 AI Summary
To address safety and collaboration bottlenecks in human-robot coexistence arising from weak human perception of robotic reasoning and insufficient trust, this paper proposes XR-DT—a novel Extended Reality (XR)-enhanced Digital Twin framework. XR-DT introduces a hierarchical digital twin architecture and a chain-of-thought prompting mechanism, tightly integrating human intent, dynamic environmental states, and robotic cognitive models. Technically, it unifies Unity-based simulation, diffusion-based policy learning, multimodal large language models, AutoGen-powered multi-agent coordination, and real-time AR feedback via wearable devices. Experiments demonstrate that XR-DT significantly improves joint human-robot trajectory prediction accuracy, enhances interaction robustness, interpretability, and mutual trust in dynamic tasks, and establishes a deployable, bidirectional understanding paradigm for safe, synergistic human-robot coexistence.
📝 Abstract
As mobile robots increasingly operate alongside humans in shared workspaces, ensuring safe, efficient, and interpretable Human-Robot Interaction (HRI) has become a pressing challenge. While substantial progress has been devoted to human behavior prediction, limited attention has been paid to how humans perceive, interpret, and trust robots' inferences, impeding deployment in safety-critical and socially embedded environments. This paper presents XR-DT, an eXtended Reality-enhanced Digital Twin framework for agentic mobile robots, that bridges physical and virtual spaces to enable bi-directional understanding between humans and robots. Our hierarchical XR-DT architecture integrates virtual-, augmented-, and mixed-reality layers, fusing real-time sensor data, simulated environments in the Unity game engine, and human feedback captured through wearable AR devices. Within this framework, we design an agentic mobile robot system with a unified diffusion policy for context-aware task adaptation. We further propose a chain-of-thought prompting mechanism that allows multimodal large language models to reason over human instructions and environmental context, while leveraging an AutoGen-based multi-agent coordination layer to enhance robustness and collaboration in dynamic tasks. Initial experimental results demonstrate accurate human and robot trajectory prediction, validating the XR-DT framework's effectiveness in HRI tasks. By embedding human intention, environmental dynamics, and robot cognition into the XR-DT framework, our system enables interpretable, trustworthy, and adaptive HRI.