🤖 AI Summary
Existing mixed reality systems for selecting out-of-reach objects rely on single or deterministic cue fusion strategies, leading to significant performance degradation when the dominant cue fails. This work proposes a probabilistic cue fusion framework that, for the first time, incorporates grasp gestures as a novel modality alongside pointing direction to infer user intent through a probabilistic graphical model. To support this approach, we introduce the first Out-of-Reach Grasping (ORG) dataset and train a gesture likelihood model enabling robust multimodal fusion. User studies demonstrate that our method outperforms single-cue baselines in both selection accuracy and speed, and surpasses state-of-the-art techniques across diverse ambiguous scenarios.
📝 Abstract
Selecting out-of-reach objects is a fundamental task in mixed reality (MR). Existing methods rely on a single cue or deterministically fuse multiple cues, leading to performance degradation when the dominant cue becomes unreliable. In this work, we introduce a probabilistic cue integration framework that enables flexible combination of multiple user-generated cues for intent inference. Inspired by natural grasping behavior, we instantiate the framework with pointing direction and grasp gestures as a new interaction technique, Point&Grasp. To this end, we collect the Out-of-Reach Grasping (ORG) dataset to train a robust likelihood model of the gestural cue, which captures grasping patterns not present in existing in-reach datasets. User studies demonstrate that our selection method with cue integration not only improves accuracy and speed over single-cue baselines, but also remains practically effective compared to state-of-the-art methods across various sources of ambiguity. The dataset and code are available at https://github.com/drlxj/point-and-grasp.