🤖 AI Summary
Current imitation learning for dexterous robotic hands is hindered by a scarcity of high-fidelity demonstration data, primarily due to large motion retargeting errors, low data acquisition efficiency, and the absence of fingertip tactile sensing. To address this, we propose a motion-retargeting-free high-fidelity teleoperation data collection system: it employs a mechanically isomorphic exoskeleton coupled with a robot hand featuring 1:1 joint mapping for precise pose transmission; integrates compact, high-resolution vision-based tactile sensors on fingertips; and synchronizes RGB-D and multimodal sensory streams across the entire pipeline. The system achieves a mean absolute joint angle error of <1° and improves teleoperation success rate by 64%. Incorporating tactile feedback further boosts task success by 25% over a vision-only baseline. We also introduce the first large-scale, multimodal, high-fidelity benchmark dataset specifically designed for dexterous manipulation.
📝 Abstract
Imitation learning provides a promising approach to dexterous hand manipulation, but its effectiveness is limited by the lack of large-scale, high-fidelity data. Existing data-collection pipelines suffer from inaccurate motion retargeting, low data-collection efficiency, and missing high-resolution fingertip tactile sensing. We address this gap with MILE, a mechanically isomorphic teleoperation and data-collection system co-designed from human hand to exoskeleton to robotic hand. The exoskeleton is anthropometrically derived from the human hand, and the robotic hand preserves one-to-one joint-position isomorphism, eliminating nonlinear retargeting and enabling precise, natural control. The exoskeleton achieves a multi-joint mean absolute angular error below one degree, while the robotic hand integrates compact fingertip visuotactile modules that provide high-resolution tactile observations. Built on this retargeting-free interface, we teleoperate complex, contact-rich in-hand manipulation and efficiently collect a multimodal dataset comprising high-resolution fingertip visuotactile signals, RGB-D images, and joint positions. The teleoperation pipeline achieves a mean success rate improvement of 64%. Incorporating fingertip tactile observations further increases the success rate by an average of 25% over the vision-only baseline, validating the fidelity and utility of the dataset. Further details are available at: https://sites.google.com/view/mile-system.