🤖 AI Summary
This work addresses the challenge of acquiring high-fidelity human demonstration data for dexterous multi-finger manipulation, which is hindered by occlusion, complex hand kinematics, and dense contact interactions. To overcome this, the authors propose a wearability-prioritized hand exoskeleton system featuring a pose-tolerant thumb coupling mechanism, linkage-driven finger interfaces, and passive adaptive structures integrated with multimodal sensing—including joint encoders, AR-based end-effector pose estimation, and synchronized wrist-mounted vision. This design preserves natural hand motion while enabling precise kinematic mapping to a robotic hand. An end-to-end synchronous capture-and-replay pipeline is implemented, successfully collecting representative manipulation sequences such as precision pinch and whole-hand enveloping grasps, and demonstrating qualitative consistency between human demonstrations and robotic replay.
📝 Abstract
Scalable learning of dexterous manipulation remains bottlenecked by the difficulty of collecting natural, high-fidelity human demonstrations of multi-finger hands due to occlusion, complex hand kinematics, and contact-rich interactions. We present WHED, a wearable hand-exoskeleton system designed for in-the-wild demonstration capture, guided by two principles: wearability-first operation for extended use and a pose-tolerant, free-to-move thumb coupling that preserves natural thumb behaviors while maintaining a consistent mapping to the target robot thumb degrees of freedom. WHED integrates a linkage-driven finger interface with passive fit accommodation, a modified passive hand with robust proprioceptive sensing, and an onboard sensing/power module. We also provide an end-to-end data pipeline that synchronizes joint encoders, AR-based end-effector pose, and wrist-mounted visual observations, and supports post-processing for time alignment and replay. We demonstrate feasibility on representative grasping and manipulation sequences spanning precision pinch and full-hand enclosure grasps, and show qualitative consistency between collected demonstrations and replayed executions.