🤖 AI Summary
This study investigates whether head and hand motion in virtual reality (VR) can encode subtle cognitive states—such as confusion, hesitation, and readiness—that lack explicit motor correlates. To this end, we introduce the first frame-level annotated head-and-hand motion dataset, featuring fine-grained psychological state labels collected during structured decision-making tasks. We propose a deep temporal modeling framework that directly infers cognitive states from high-dimensional kinematic trajectories, without relying on auxiliary physiological or linguistic signals. Experiments demonstrate that our model achieves performance comparable to human observers, confirming that standard VR telemetry data contains rich, decodable cognitive information. Key contributions include: (1) the first empirical validation that action-decoupled cognitive states are recoverable from movement dynamics; and (2) the public release of a high-quality labeled dataset and an open-source modeling framework, establishing a new paradigm for adaptive virtual environments and cognitive computing.
📝 Abstract
As virtual reality (VR) and augmented reality (AR) continue to gain popularity, head and hand motion data captured by consumer VR systems have become ubiquitous. Prior work shows that such telemetry can be highly identifying and reflect broad user traits, often aligning with intuitive "folk theories" of body language. However, it remains unclear to what extent motion kinematics encode more nuanced cognitive states, such as confusion, hesitation, and readiness, which lack clear correlates with motion. To investigate this, we introduce a novel dataset of head and hand motion with frame-level annotations of these states collected during structured decision-making tasks. Our findings suggest that deep temporal models can infer subtle cognitive states from motion alone, achieving comparable performance with human observers. This work demonstrates that standard VR telemetry contains strong patterns related to users' internal cognitive processes, which opens the door for a new generation of adaptive virtual environments. To enhance reproducibility and support future work, we will make our dataset and modeling framework publicly available.