Sensible Agent: A Framework for Unobtrusive Interaction with Proactive AR Agents

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Existing active AR agents rely on explicit voice interaction, causing environmental disturbance and social discomfort. This work proposes a seamless interaction framework that leverages egocentric vision and multimodal sensing to infer user context in real time, enabling large multimodal models (LMMs) to dynamically decide *when* and *how* to deliver assistance—thereby achieving low-intrusion, contextually appropriate proactive support. The system is deployed on XR headsets with on-device real-time inference. A user study demonstrates that, compared to voice-triggered baselines, our approach significantly reduces interaction burden (*p* < 0.01), improves usability by 32%, and achieves an 89% user preference rate. Our core contribution is the first integration of context-driven, multimodal temporal decision-making into active AR interaction—establishing a truly natural, implicit, and scalable intelligent assistance paradigm.

Technology Category

Application Category

📝 Abstract

Proactive AR agents promise context-aware assistance, but their interactions often rely on explicit voice prompts or responses, which can be disruptive or socially awkward. We introduce Sensible Agent, a framework designed for unobtrusive interaction with these proactive agents. Sensible Agent dynamically adapts both "what" assistance to offer and, crucially, "how" to deliver it, based on real-time multimodal context sensing. Informed by an expert workshop (n=12) and a data annotation study (n=40), the framework leverages egocentric cameras, multimodal sensing, and Large Multimodal Models (LMMs) to infer context and suggest appropriate actions delivered via minimally intrusive interaction modes. We demonstrate our prototype on an XR headset through a user study (n=10) in both AR and VR scenarios. Results indicate that Sensible Agent significantly reduces perceived interaction effort compared to voice-prompted baseline, while maintaining high usability and achieving higher preference.

Problem

Research questions and friction points this paper is trying to address.

Reducing disruptive explicit interactions in proactive AR agents

Adapting assistance delivery based on real-time context sensing

Minimizing social awkwardness through unobtrusive multimodal interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multimodal sensing for context awareness

Leverages LMMs for adaptive assistance delivery

Minimally intrusive interaction via dynamic adaptation

🔎 Similar Papers

EmBARDiment: an Embodied AI Agent for Productivity in XR