🤖 AI Summary
This work addresses key challenges in deploying highly dexterous anthropomorphic robotic hands in real-world scenarios: limited-sample learning, poor fine-motion smoothness, and weak generalization. We propose a high-frequency generative control framework based on diffusion models. Our method integrates a custom-built 16-DoF tendon-driven hand, a wide-angle wrist-mounted camera, and a Franka Panda manipulator to establish an end-to-end multimodal perception–action co-training paradigm, supporting both VR/data-glove teleoperation and closed-loop deployment on physical hardware. The core contribution is a novel diffusion-driven self-correcting control mechanism, alongside the first empirical characterization of policy performance scalability with respect to model capacity. Experiments demonstrate 93.3% success rate on out-of-distribution tasks; the self-correcting mechanism yields a +33.3% performance gain, significantly improving robustness and generalization in complex manipulation tasks.
📝 Abstract
We present a diffusion-based model recipe for real-world control of a highly dexterous humanoid robotic hand, designed for sample-efficient learning and smooth fine-motor action inference. Our system features a newly designed 16-DoF tendon-driven hand, equipped with wide angle wrist cameras and mounted on a Franka Emika Panda arm. We develop a versatile teleoperation pipeline and data collection protocol using both glove-based and VR interfaces, enabling high-quality data collection across diverse tasks such as pick and place, item sorting and assembly insertion. Leveraging high-frequency generative control, we train end-to-end policies from raw sensory inputs, enabling smooth, self-correcting motions in complex manipulation scenarios. Real-world evaluations demonstrate up to 93.3% out of distribution success rates, with up to a +33.3% performance boost due to emergent self-correcting behaviors, while also revealing scaling trends in policy performance. Our results advance the state-of-the-art in dexterous robotic manipulation through a fully integrated, practical approach to hardware, learning, and real-world deployment.