🤖 AI Summary
Existing teleoperation methods for handheld robotic demonstration are limited in contact-intensive bimanual tasks due to poor hardware adaptability, low data efficacy, and the absence of realistic tactile feedback, resulting in weak demonstrability and insufficient policy robustness. This work proposes TAMEn (Tactile-Aware Manipulation Engine), introducing a cross-morphology wearable interface that enables rapid adaptation to heterogeneous grippers. It integrates dual-modality trajectory capture—combining high-fidelity motion tracking with portable VR—and incorporates a tactile-visualization-based teleoperation framework to establish a human-in-the-loop recovery mechanism, thereby forming a pyramid-structured data pipeline. The proposed framework significantly enhances demonstration repeatability, improving task success rates from 34% to 75% across diverse bimanual manipulation scenarios. To foster community advancement, the authors open-source both the hardware designs and the collected dataset.
📝 Abstract
Handheld paradigms offer an efficient and intuitive way for collecting large-scale demonstration of robot manipulation. However, achieving contact-rich bimanual manipulation through these methods remains a pivotal challenge, which is substantially hindered by hardware adaptability and data efficacy. Prior hardware designs remain gripper-specific and often face a trade-off between tracking precision and portability. Furthermore, the lack of online feasibility checking during demonstration leads to poor replayability. More importantly, existing handheld setups struggle to collect interactive recovery data during robot execution, lacking the authentic tactile information necessary for robust policy refinement. To bridge these gaps, we present TAMEn, a tactile-aware manipulation engine for closed-loop data collection in contact-rich tasks. Our system features a cross-morphology wearable interface that enables rapid adaptation across heterogeneous grippers. To balance data quality and environmental diversity, we implement a dual-modal acquisition pipeline: a precision mode leveraging motion capture for high-fidelity demonstrations, and a portable mode utilizing VR-based tracking for in-the-wild acquisition and tactile-visualized recovery teleoperation. Building on this hardware, we unify large-scale tactile pretraining, task-specific bimanual demonstrations, and human-in-the-loop recovery data into a pyramid-structured data regime, enabling closed-loop policy refinement. Experiments show that our feasibility-aware pipeline significantly improves demonstration replayability, and that the proposed visuo-tactile learning framework increases task success rates from 34% to 75% across diverse bimanual manipulation tasks. We further open-source the hardware and dataset to facilitate reproducibility and support research in visuo-tactile manipulation.