🤖 AI Summary
To address the lack of low-cost, highly flexible platforms for collecting real-world multi-finger manipulation data, this paper introduces the first hardware-aware teleoperation system jointly optimized for dexterous manipulation—featuring a 20-degree-of-freedom anthropomorphic hand. Our contributions are threefold: (1) a compact mechatronic architecture integrated with a custom multimodal sensing circuit, enabling wrist-mounted RGB-D vision, distributed piezoresistive tactile sensing, and sub-7-ms latency proprioception—all spatially aligned; (2) a dual-constrained motion remapping teleoperation interface that significantly improves control accuracy and stability; and (3) a diffusion-based policy trained on our high-quality collected dataset, outperforming prior methods on grasping and manipulation tasks. The entire system is built from low-cost commercial-off-the-shelf components, and all designs—including hardware, firmware, and software—are fully open-sourced and reproducible, establishing a scalable data infrastructure for research in general-purpose robotic autonomy.
📝 Abstract
This paper addresses the scarcity of low-cost but high-dexterity platforms for collecting real-world multi-fingered robot manipulation data towards generalist robot autonomy. To achieve it, we propose the RAPID Hand, a co-optimized hardware and software platform where the compact 20-DoF hand, robust whole-hand perception, and high-DoF teleoperation interface are jointly designed. Specifically, RAPID Hand adopts a compact and practical hand ontology and a hardware-level perception framework that stably integrates wrist-mounted vision, fingertip tactile sensing, and proprioception with sub-7 ms latency and spatial alignment. Collecting high-quality demonstrations on high-DoF hands is challenging, as existing teleoperation methods struggle with precision and stability on complex multi-fingered systems. We address this by co-optimizing hand design, perception integration, and teleoperation interface through a universal actuation scheme, custom perception electronics, and two retargeting constraints. We evaluate the platform's hardware, perception, and teleoperation interface. Training a diffusion policy on collected data shows superior performance over prior works, validating the system's capability for reliable, high-quality data collection. The platform is constructed from low-cost and off-the-shelf components and will be made public to ensure reproducibility and ease of adoption.