🤖 AI Summary
This work addresses the challenge of continuous hand pose tracking on commodity smartwatches, which is hindered by reliance on external or custom hardware. The authors propose the first method that leverages only the built-in speaker and microphone of mass-produced smartwatches to emit inaudible frequency-modulated continuous wave (FMCW) signals and capture their reflections from the hand. A lightweight deep neural network then estimates the 3D positions of all 20 finger joints in real time from these acoustic echoes. Requiring no additional hardware, the approach enables robust, cross-device, and cross-scenario interaction. Evaluated under diverse real-world conditions—including multiple devices, hand poses, and ambient noise—it achieves a cross-session mean joint error of 7.87 mm. Furthermore, minimal fine-tuning with limited user-specific data substantially improves generalization to new users and unseen gestures.
📝 Abstract
Tracking hand poses on wrist-wearables enables rich, expressive interactions, yet remains unavailable on commercial smartwatches, as prior implementations rely on external sensors or custom hardware, limiting their real-world applicability. To address this, we present WatchHand, the first continuous 3D hand pose tracking system implemented on off-the-shelf smartwatches using only their built-in speaker and microphone. WatchHand emits inaudible frequency-modulated continuous waves and captures their reflections from the hand. These acoustic signals are processed by a deep-learning model that estimates 3D hand poses for 20 finger joints. We evaluate WatchHand across diverse real-world conditions -- multiple smartwatch models, wearing-hands, body postures, noise conditions, pose-variation protocols -- and achieve a mean per-joint position error of 7.87 mm in cross-session tests with device remounting. Although performance drops for unseen users or gestures, the model adapts effectively with lightweight fine-tuning on small amounts of data. Overall, WatchHand lowers the barrier to smartwatch-based hand tracking by eliminating additional hardware while enabling robust, always-available interactions on millions of existing devices.