🤖 AI Summary
This work addresses the limitations of conventional imitation learning data collection, which often fails to capture fine-grained force control and tactile feedback during manipulation and lacks explicit task-structure annotations. The authors propose a teleoperation system that integrates vision, touch, and real-time task labeling, employing a finger-driven gripper to preserve natural force feedback. The system simultaneously records visual inputs, contact geometry, and operator-provided temporal annotations of key task phases. By uniquely combining natural tactile feedback, in-hand force sensing, and real-time task-structure labeling, this approach constructs a multimodal demonstration dataset enriched with contact and temporal semantics, significantly enhancing robotic imitation learning performance from coarse to fine manipulation skills.
📝 Abstract
We present a visuo-tactile data-collection system that generates temporally structured, contact-rich demonstrations for imitation learning. Conventional systems often decouple the operator from contact forces, which hinders the demonstration of subtle force modulation. Our system introduces a direct-drive gripper that the operator actuates with the fingers, preserving natural haptic feedback. Integrated visual sensors and custom tactile arrays capture image streams and contact geometry. A handle-mounted push button enables the operator to annotate the task's temporal structure in real time by marking task-critical regions. By fusing in-hand force perception with in-situ temporal annotation, the system produces multimodal datasets designed for coarse-to-fine learning algorithms that exploit structural task knowledge, enabling the development of high-quality manipulation policies.