ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface

πŸ“… 2025-04-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the high cost, low efficiency, and robot-dependency of tactile data acquisition in dexterous manipulation, this paper proposes a robot-free, teleoperation-free portable visuotactile manipulation interface. The interface integrates a Fin Ray–based soft gripper with a high-density tactile sensor array, enabling efficient, contact-intensive, hand-held visuotactile co-acquisition. Furthermore, we design a cross-modal self-supervised pretraining method to learn robust multimodal tactile representations and build an end-to-end imitation learning framework. Evaluated on seven representative contact-rich tasks, our approach achieves over 3Γ— higher data collection efficiency than baseline methods, while significantly improving policy generalization and robustness to disturbances. Key innovations include (i) the first handheld visuotactile co-acquisition paradigm and (ii) a novel multimodal representation learning mechanism tailored for few-shot tactile understanding.

Technology Category

Application Category

πŸ“ Abstract
Tactile information plays a crucial role for humans and robots to interact effectively with their environment, particularly for tasks requiring the understanding of contact properties. Solving such dexterous manipulation tasks often relies on imitation learning from demonstration datasets, which are typically collected via teleoperation systems and often demand substantial time and effort. To address these challenges, we present ViTaMIn, an embodiment-free manipulation interface that seamlessly integrates visual and tactile sensing into a hand-held gripper, enabling data collection without the need for teleoperation. Our design employs a compliant Fin Ray gripper with tactile sensing, allowing operators to perceive force feedback during manipulation for more intuitive operation. Additionally, we propose a multimodal representation learning strategy to obtain pre-trained tactile representations, improving data efficiency and policy robustness. Experiments on seven contact-rich manipulation tasks demonstrate that ViTaMIn significantly outperforms baseline methods, demonstrating its effectiveness for complex manipulation tasks.
Problem

Research questions and friction points this paper is trying to address.

Enables contact-rich task learning without robot teleoperation
Integrates visual and tactile sensing for intuitive data collection
Improves manipulation efficiency with multimodal representation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embodiment-free interface with visual-tactile sensing
Compliant Fin Ray gripper for force feedback
Multimodal representation learning for tactile data
Fangchen Liu
Fangchen Liu
Google DeepMind, UC Berkeley
machine learningrobotics
C
Chuanyu Li
Tsinghua University
Y
Yihua Qin
Tsinghua University
A
Ankit Shaw
Tsinghua University
J
Jing Xu
Tsinghua University
Pieter Abbeel
Pieter Abbeel
UC Berkeley | Covariant
RoboticsMachine LearningAI
R
Rui Chen
Tsinghua University