Gaze-Guided 3D Hand Motion Prediction for Detecting Intent in Egocentric Grasping Tasks

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In neural rehabilitation, intent recognition often relies solely on physiological signals while neglecting environmental context. To address this limitation, we propose a prospective 3D hand pose and joint motion sequence prediction method that operates without prior knowledge of target objects. Our approach innovatively integrates dynamic eye movement signals—specifically gaze fixation points—into a hand motion generation framework, jointly modeling historical hand trajectories and scene-level object information to enable zero-prior intent recognition. The method employs a multimodal temporal fusion architecture combining VQ-VAE with an autoregressive Transformer. Evaluated on a multi-subject, multi-object grasping dataset, our gaze-guided model reduces prediction error by 27.3% compared to baseline methods. Moreover, it achieves high-fidelity trajectory generation using only three input frames, demonstrating superior robustness in low-data regimes and enhanced suitability for real-time applications.

Technology Category

Application Category

📝 Abstract
Human intention detection with hand motion prediction is critical to drive the upper-extremity assistive robots in neurorehabilitation applications. However, the traditional methods relying on physiological signal measurement are restrictive and often lack environmental context. We propose a novel approach that predicts future sequences of both hand poses and joint positions. This method integrates gaze information, historical hand motion sequences, and environmental object data, adapting dynamically to the assistive needs of the patient without prior knowledge of the intended object for grasping. Specifically, we use a vector-quantized variational autoencoder for robust hand pose encoding with an autoregressive generative transformer for effective hand motion sequence prediction. We demonstrate the usability of these novel techniques in a pilot study with healthy subjects. To train and evaluate the proposed method, we collect a dataset consisting of various types of grasp actions on different objects from multiple subjects. Through extensive experiments, we demonstrate that the proposed method can successfully predict sequential hand movement. Especially, the gaze information shows significant enhancements in prediction capabilities, particularly with fewer input frames, highlighting the potential of the proposed method for real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Predicts hand poses and joint positions for intent detection
Integrates gaze, hand motion, and object data dynamically
Enhances prediction with gaze, especially with fewer frames
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates gaze, hand motion, and object data
Uses vector-quantized variational autoencoder for encoding
Employs autoregressive transformer for motion prediction
🔎 Similar Papers
No similar papers found.