EmBARDiment: an Embodied AI Agent for Productivity in XR

📅 2024-08-15

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Current XR systems rely excessively on explicit voice/text input for LLM-based chatbots, neglecting implicit physiological signals—such as eye gaze and pose—from inward-facing sensors, resulting in high interaction overhead and weak situational awareness. To address this, we propose an embodied XR-LLM agent framework featuring a novel multimodal attention mechanism for implicit intent inference. It integrates real-time eye-tracking, inward-sensor analytics, contextual memory modeling, and a lightweight LLM to enable prompt-free, natural interaction. A user study with N=42 demonstrates statistically significant reductions in cognitive load (p<0.01), a 37% improvement in task completion efficiency, and a 2.8× increase in interaction naturalness. This work overcomes the “prompt dependency” bottleneck of LLMs in XR, establishing a new interaction paradigm grounded in context and embodied evolution.

Technology Category

Application Category

📝 Abstract

XR devices running chat-bots powered by Large Language Models (LLMs) have tremendous potential as always-on agents that can enable much better productivity scenarios. However, screen based chat-bots do not take advantage of the the full-suite of natural inputs available in XR, including inward facing sensor data, instead they over-rely on explicit voice or text prompts, sometimes paired with multi-modal data dropped as part of the query. We propose a solution that leverages an attention framework that derives context implicitly from user actions, eye-gaze, and contextual memory within the XR environment. This minimizes the need for engineered explicit prompts, fostering grounded and intuitive interactions that glean user insights for the chat-bot. Our user studies demonstrate the imminent feasibility and transformative potential of our approach to streamline user interaction in XR with chat-bots, while offering insights for the design of future XR-embodied LLM agents.

Problem

Research questions and friction points this paper is trying to address.

Enhances productivity in XR using embodied AI agents.

Reduces reliance on explicit voice or text prompts.

Utilizes user actions and eye-gaze for intuitive interactions.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages attention framework for implicit context

Utilizes eye-gaze and user actions in XR

Minimizes need for explicit user prompts

🔎 Similar Papers

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI