🤖 AI Summary
To address the challenges of direct interaction in mixed reality—such as occlusion, spatial clutter, and distant object access—this paper proposes a semantic proxy–based interaction paradigm. It maps physical objects to computationally tractable abstract proxies, decoupling physical constraints from user intent. These AI-enriched proxies encode semantic attributes and hierarchical spatial relationships, enabling attribute-based filtering, nested group navigation, and complex multi-object selection without introducing new gestures or UI elements. The system supports dynamic, bidirectional mapping between proxies and real-world objects, augmented with intelligent contextual enhancement. Evaluated across office information retrieval, large-scale spatial navigation, and multi-UAV coordinated control, the approach demonstrates broad applicability; expert assessment confirms significant improvements in operational efficiency and usability. The core contribution is the first integration of semantic modeling, abstract proxy representation, and MR interaction into a lightweight, understanding-capable interaction middleware layer.
📝 Abstract
Interacting with real-world objects in Mixed Reality (MR) often proves difficult when they are crowded, distant, or partially occluded, hindering straightforward selection and manipulation. We observe that these difficulties stem from performing interaction directly on physical objects, where input is tightly coupled to their physical constraints. Our key insight is to decouple interaction from these constraints by introducing proxies-abstract representations of real-world objects. We embody this concept in Reality Proxy, a system that seamlessly shifts interaction targets from physical objects to their proxies during selection. Beyond facilitating basic selection, Reality Proxy uses AI to enrich proxies with semantic attributes and hierarchical spatial relationships of their corresponding physical objects, enabling novel and previously cumbersome interactions in MR - such as skimming, attribute-based filtering, navigating nested groups, and complex multi object selections - all without requiring new gestures or menu systems. We demonstrate Reality Proxy's versatility across diverse scenarios, including office information retrieval, large-scale spatial navigation, and multi-drone control. An expert evaluation suggests the system's utility and usability, suggesting that proxy-based abstractions offer a powerful and generalizable interaction paradigm for future MR systems.