๐ค AI Summary
Data collection for dexterous manipulation with multi-fingered robotic hands is challenging and poorly scalable, resulting in a severe scarcity of high-quality training data and hindering data-driven policy learning. To address this, we propose a hand-agnostic, progressive kinematics-dynamics alignment framework that synthesizes semantically consistent and motion-smooth robotic dexterous manipulation trajectories directly from monocular human demonstration videosโwithout requiring any real-world robot interaction data. Our method integrates kinematic matching, thumb-guided initialization, action-space remapping, residual policy learning under a unified reward, and wrist-trajectory optimization to enhance inter-finger coordination. Evaluated across diverse hand morphologies, objects, and manipulation tasks, the framework enables efficient cross-domain policy transfer, achieving an average success rate of 73%. It significantly improves data efficiency and generalization capability over prior approaches.
๐ Abstract
The inherent difficulty and limited scalability of collecting manipulation data using multi-fingered robot hand hardware platforms have resulted in severe data scarcity, impeding research on data-driven dexterous manipulation policy learning. To address this challenge, we present a hand-agnostic manipulation transfer system. It efficiently converts human hand manipulation sequences from demonstration videos into high-quality dexterous manipulation trajectories without requirements of massive training data. To tackle the multi-dimensional disparities between human hands and dexterous hands, as well as the challenges posed by high-degree-of-freedom coordinated control of dexterous hands, we design a progressive transfer framework: first, we establish primary control signals for dexterous hands based on kinematic matching; subsequently, we train residual policies with action space rescaling and thumb-guided initialization to dynamically optimize contact interactions under unified rewards; finally, we compute wrist control trajectories with the objective of preserving operational semantics. Using only human hand manipulation videos, our system automatically configures system parameters for different tasks, balancing kinematic matching and dynamic optimization across dexterous hands, object categories, and tasks. Extensive experimental results demonstrate that our framework can automatically generate smooth and semantically correct dexterous hand manipulation that faithfully reproduces human intentions, achieving high efficiency and strong generalizability with an average transfer success rate of 73%, providing an easily implementable and scalable method for collecting robot dexterous manipulation data.