🤖 AI Summary
Existing text input methods in extended reality (XR) suffer from low efficiency and high cognitive load, falling short of the performance offered by physical keyboards or touchscreens. This work proposes a novel input paradigm that integrates eye-gaze-based swipe gestures with sustained pinch-hand gestures. By incorporating language modeling during swiping, an in-gesture cancellation mechanism, a low-latency decoder, and spatiotemporal dynamic time warping, the system achieves high accuracy while substantially improving input speed. User studies demonstrate that participants reached a peak typing rate of 64.7 words per minute after 30 training sessions—significantly outperforming conventional key-by-key selection, finger-tap-only, and hand-swipe-only approaches. Moreover, users reported strong preference for the method and exhibited durable learning effects over time.
📝 Abstract
Despite steady progress, text entry in Extended Reality (XR) often remains slower and more effortful than typing on a physical keyboard or touchscreen. We explore a simple idea: use gaze to swipe through a virtual keyboard for the fast, low-effort where and a manual pinch held throughout the swipe for the when, extending and validating it through a series of user studies. We first show that a basic version including a low-latency decoder with spatiotemporal Dynamic Time Warping and fixation filtering outperforms selecting individual keys sequentially, either by finger tapping each or gazing at each while pinching. We then add mid-swipe prediction and in-gesture cancellation, improving words per minute (WPM) without hurting accuracy. We show that this approach is faster and more preferred than previous gaze-swipe approaches, finger tapping with prediction, or hand swiping with the same additions. Furthermore, a seven-day, 30-session study demonstrates sustained learning, with peak performance reaching 64.7 WPM.