JARVIS: A Just-in-Time AR Visual Instruction System for Cross-Reality Task Guidance

πŸ“… 2026-04-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

193K/year
πŸ€– AI Summary
This study addresses the limitations of traditional tutorials, which disrupt workflow and increase cognitive load by requiring frequent switching between instructions and task execution, as well as the inability of existing augmented reality (AR) guidance systems to support mixed physical-virtual tasks. The authors formally define four categories of cross-reality tasksβ€”real-to-real (R2R), real-to-virtual (R2V), virtual-to-real (V2R), and virtual-to-virtual (V2V)β€”and propose a novel method that leverages vision-language models to generate context-aware, step-by-step AR instructions from a single user prompt. Their approach integrates real-time state verification and adaptive visual feedback to coordinate execution across physical and virtual domains. A user study (N=14) demonstrates that this method significantly improves task success rate, usability, and visual guidance quality while effectively reducing cognitive load.

Technology Category

Application Category

πŸ“ Abstract
Many everyday tasks rely on external tutorials such as manuals and videos, requiring users to constantly switch between reading instructions and performing actions, which disrupts workflow and increases cognitive load. Augmented reality (AR) enables in-situ guidance, while recent advances in large language models (LLMs) and vision-language models (VLMs) make it possible to automatically generate such guidance. However, existing AI-powered AR tutorial systems primarily focus on physical procedural tasks and provide limited support for hybrid physical and virtual workspaces. To address this gap, we conduct a formative study of cross-reality tasks and identify key requirements for state awareness and cross-reality coordination. We present JARVIS, a VLM-driven AR instruction system that generates contextual, step-by-step guidance from a single prompt, with real-time state verification and adaptive visual feedback. To inform the system design, we conducted a formative study to understand guidance needs across cross-reality tasks, which we categorize into four types, real-to-real (R2R), real-to-virtual (R2V), virtual-to-real (V2R), and virtual-to-virtual (V2V). A within-subjects study (N=14) across four domains shows JARVIS improves usability, workload, success rate, and visualization effectiveness over baselines.
Problem

Research questions and friction points this paper is trying to address.

cross-reality tasks
augmented reality
task guidance
state awareness
hybrid workspaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-reality
vision-language model
augmented reality
just-in-time guidance
state awareness
πŸ”Ž Similar Papers
No similar papers found.