ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface

📅 2025-04-08

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the high cost, low efficiency, and robot-dependency of tactile data acquisition in dexterous manipulation, this paper proposes a robot-free, teleoperation-free portable visuotactile manipulation interface. The interface integrates a Fin Ray–based soft gripper with a high-density tactile sensor array, enabling efficient, contact-intensive, hand-held visuotactile co-acquisition. Furthermore, we design a cross-modal self-supervised pretraining method to learn robust multimodal tactile representations and build an end-to-end imitation learning framework. Evaluated on seven representative contact-rich tasks, our approach achieves over 3× higher data collection efficiency than baseline methods, while significantly improving policy generalization and robustness to disturbances. Key innovations include (i) the first handheld visuotactile co-acquisition paradigm and (ii) a novel multimodal representation learning mechanism tailored for few-shot tactile understanding.

Technology Category

Application Category

📝 Abstract

Tactile information plays a crucial role for humans and robots to interact effectively with their environment, particularly for tasks requiring the understanding of contact properties. Solving such dexterous manipulation tasks often relies on imitation learning from demonstration datasets, which are typically collected via teleoperation systems and often demand substantial time and effort. To address these challenges, we present ViTaMIn, an embodiment-free manipulation interface that seamlessly integrates visual and tactile sensing into a hand-held gripper, enabling data collection without the need for teleoperation. Our design employs a compliant Fin Ray gripper with tactile sensing, allowing operators to perceive force feedback during manipulation for more intuitive operation. Additionally, we propose a multimodal representation learning strategy to obtain pre-trained tactile representations, improving data efficiency and policy robustness. Experiments on seven contact-rich manipulation tasks demonstrate that ViTaMIn significantly outperforms baseline methods, demonstrating its effectiveness for complex manipulation tasks.

Problem

Research questions and friction points this paper is trying to address.

Enables contact-rich task learning without robot teleoperation

Integrates visual and tactile sensing for intuitive data collection

Improves manipulation efficiency with multimodal representation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Embodiment-free interface with visual-tactile sensing

Compliant Fin Ray gripper for force feedback

Multimodal representation learning for tactile data

🔎 Similar Papers

MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation