🤖 AI Summary
Existing hand pose tracking methods suffer from limitations in portability and force-sensing capability. This paper proposes a lightweight ring–watch cooperative wearable system that fuses fingertip IMU inertial measurements with single-channel wrist sEMG electromyographic signals to jointly reconstruct 3D hand pose and estimate individual fingertip contact forces. We introduce a novel dual-branch Transformer architecture with cross-modal cross-attention mechanisms and incorporate a biomechanics-inspired kinematic constraint loss to enhance estimation accuracy and real-time performance. Evaluated on 20 subjects, the system achieves a mean joint position error of 0.57 cm and a fingertip force estimation RMSE of 0.213 (r = 0.76). It has been successfully deployed in a real-time Unity-based virtual interaction system, demonstrating practical utility and robustness for natural, force-aware human–computer interaction.
📝 Abstract
Hand pose tracking is essential for advancing applications in human-computer interaction. Current approaches, such as vision-based systems and wearable devices, face limitations in portability, usability, and practicality. We present a novel wearable system that reconstructs 3D hand pose and estimates per-finger forces using a minimal ring-watch sensor setup. A ring worn on the finger integrates an inertial measurement unit (IMU) to capture finger motion, while a smartwatch-based single-channel electromyography (EMG) sensor on the wrist detects muscle activations. By leveraging the complementary strengths of motion sensing and muscle signals, our approach achieves accurate hand pose tracking and grip force estimation in a compact wearable form factor. We develop a dual-branch transformer network that fuses IMU and EMG data with cross-modal attention to predict finger joint positions and forces simultaneously. A custom loss function imposes kinematic constraints for smooth force variation and realistic force saturation. Evaluation with 20 participants performing daily object interaction gestures demonstrates an average Mean Per Joint Position Error (MPJPE) of 0.57 cm and a fingertip force estimation (RMSE: 0.213, r=0.76). We showcase our system in a real-time Unity application, enabling virtual hand interactions that respond to user-applied forces. This minimal, force-aware tracking system has broad implications for VR/AR, assistive prosthetics, and ergonomic monitoring.