Beyond-Voice: Towards Continuous 3D Hand Pose Tracking on Commercial Home Assistant Devices

πŸ“… 2023-06-30
πŸ›οΈ International Symposium on Information Processing in Sensor Networks
πŸ“ˆ Citations: 5
✨ Influential: 1
πŸ“„ PDF
πŸ€– AI Summary
Current household voice assistants rely on voice-user interfaces (VUIs), suffering from poor accessibility, while camera-based solutions incur high costs and pose significant privacy risks. This work pioneers the repurposing of commercial voice assistants’ speaker-microphone systems into high-fidelity active sonar for camera-free, calibration-free, continuous 3D hand gesture tracking. Our approach integrates active acoustic sensing, high-resolution range-profile modeling, and a lightweight temporal neural network to directly regress millimeter-accurate 3D poses of all 21 finger joint keypoints. We evaluate the system across three realistic home environments with 11 diverse users. Results show a mean absolute error of only 16.47 mm, demonstrating strong cross-user and cross-environment generalization without requiring user-specific training data. To our knowledge, this is the first method enabling robust, privacy-preserving, zero-calibration 3D hand tracking using commodity voice assistant hardware.
πŸ“ Abstract
The surging popularity of home assistants and their voice user interface (VUI) have made them an ideal central control hub for smart home devices. However, current form factors heavily rely on VUI, which poses accessibility and usability issues; some latest ones are equipped with additional cameras and displays, which are costly and raise privacy concerns. These concerns jointly motivate Beyond-Voice, a novel high-fidelity acoustic sensing system that allows commodity home assistant devices to track and reconstruct hand poses continuously. It transforms the home assistant into an active sonar system using its existing onboard microphones and speakers. We feed a high-resolution range profile to the deep learning model that can analyze the motions of multiple body parts and predict the 3D positions of 21 finger joints, bringing the granularity for acoustic hand tracking to the next level. It operates across different environments and users without the need for personalized training data. A user study with 11 participants in 3 different environments shows that Beyond-Voice can track joints with an average mean absolute error of 16.47mm without any training data provided by the testing subject.
Problem

Research questions and friction points this paper is trying to address.

Continuous 3D hand pose tracking using acoustic sensing
Overcoming voice-only limitations and privacy concerns
Enabling gesture control on commercial home assistants
Innovation

Methods, ideas, or system contributions that make the work stand out.

Acoustic sensing system for hand tracking
Uses existing microphones and speakers
Deep learning model predicts 3D joint positions
πŸ”Ž Similar Papers
No similar papers found.
Y
Yin Li
Cornell University
R
Rohan Reddy
Cornell University
C
Cheng Zhang
Cornell University
Rajalakshmi Nandakumar
Rajalakshmi Nandakumar
Cornell Tech
Mobile HealthNetworks