🤖 AI Summary
Current VR systems provide only head and controller tracking, lacking full-body joint motion data—severely limiting behavioral biometric accuracy. To address this, we propose an external 2D-video-driven 3D motion inversion method: a monocular camera captures 2D poses of key right-side anatomical joints (shoulder, elbow, wrist, hip, knee, ankle), and a temporal Transformer model leverages these observations to accurately predict both past and future 3D trajectories of the VR controller. This is the first work to jointly integrate external 2D pose estimation with temporal modeling for implicit reconstruction of body motions unobserved by the VR system. Our approach significantly enhances discriminative power of behavioral biometrics, achieving a state-of-the-art equal error rate (EER) of 0.025 in VR-based identity authentication—improving upon the 3D-trajectory-only baseline by up to 0.040 EER reduction.
📝 Abstract
Critical VR applications in domains such as healthcare, education, and finance that use traditional credentials, such as PIN, password, or multi-factor authentication, stand the chance of being compromised if a malicious person acquires the user credentials or if the user hands over their credentials to an ally. Recently, a number of approaches on user authentication have emerged that use motions of VR head-mounted displays (HMDs) and hand controllers during user interactions in VR to represent the user's behavior as a VR biometric signature. One of the fundamental limitations of behavior-based approaches is that current on-device tracking for HMDs and controllers lacks capability to perform tracking of full-body joint articulation, losing key signature data encapsulated by the user articulation. In this paper, we propose an approach that uses 2D body joints, namely shoulder, elbow, wrist, hip, knee, and ankle, acquired from the right side of the participants using an external 2D camera. Using a Transformer-based deep neural network, our method uses the 2D data of body joints that are not tracked by the VR device to predict past and future 3D tracks of the right controller, providing the benefit of augmenting 3D knowledge in authentication. Our approach provides a minimum equal error rate (EER) of 0.025, and a maximum EER drop of 0.040 over prior work that uses single-unit 3D trajectory as the input.