🤖 AI Summary
This work addresses the limitations of existing first-person surface interaction techniques, which suffer from insufficient hand-tracking accuracy and unreliable plane estimation, hindering stable and precise input. To overcome these challenges, we propose a novel approach that fuses hand pose data from a head-mounted device with inertial measurements from a smartwatch. Our method is the first to leverage the complementary strengths of these modalities—3D positional information from vision-based tracking and high-frequency motion cues from IMU sensors—for robust contact detection and multi-class gesture recognition on surfaces. Evaluated with 21 participants, the system significantly outperforms unimodal baselines, enabling high-precision touch tracking and accurate recognition of eight distinct gestures, thereby overcoming the accuracy and stability bottlenecks inherent in single-modality approaches.
📝 Abstract
Mid-air gestures in Extended Reality (XR) often cause fatigue and imprecision. Surface-based interactions offer improved accuracy and comfort, but current egocentric vision methods struggle due to hand tracking challenges and unreliable surface plane estimation. We introduce SurfaceXR, a sensor fusion approach combining headset-based hand tracking with smartwatch IMU data to enable robust inputs on everyday surfaces. Our insight is that these modalities are complementary: hand tracking provides 3D positional data while IMUs capture high-frequency motion. A 21-participant study validates SurfaceXR's effectiveness for touch tracking and 8-class gesture recognition, demonstrating significant improvements over single-modality approaches.