Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper

📅 2025-07-20

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing handheld grippers typically lack tactile sensing capabilities, limiting fine manipulation in complex environments. This paper introduces a lightweight, portable vision–tactile fusion gripper hardware platform, integrating high-resolution tactile sensors to enable synchronized acquisition of visual and tactile data in real-world field settings. We further propose a cross-modal representation learning framework that jointly models vision and touch, generating interpretable, contact-focused multimodal representations—preserving modality-specific characteristics while enhancing capture of physically critical interaction cues. Evaluation on high-precision tasks—including test-tube insertion and pipetting—demonstrates that our approach significantly improves policy learning efficiency and operational robustness, particularly under external disturbances.

Technology Category

Application Category

📝 Abstract

Handheld grippers are increasingly used to collect human demonstrations due to their ease of deployment and versatility. However, most existing designs lack tactile sensing, despite the critical role of tactile feedback in precise manipulation. We present a portable, lightweight gripper with integrated tactile sensors that enables synchronized collection of visual and tactile data in diverse, real-world, and in-the-wild settings. Building on this hardware, we propose a cross-modal representation learning framework that integrates visual and tactile signals while preserving their distinct characteristics. The learning procedure allows the emergence of interpretable representations that consistently focus on contacting regions relevant for physical interactions. When used for downstream manipulation tasks, these representations enable more efficient and effective policy learning, supporting precise robotic manipulation based on multimodal feedback. We validate our approach on fine-grained tasks such as test tube insertion and pipette-based fluid transfer, demonstrating improved accuracy and robustness under external disturbances. Our project page is available at https://binghao-huang.github.io/touch_in_the_wild/ .

Problem

Research questions and friction points this paper is trying to address.

Developing a portable gripper with tactile sensing for real-world manipulation

Integrating visual and tactile data for cross-modal representation learning

Enhancing robotic precision in fine-grained tasks using multimodal feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Portable gripper with integrated tactile sensors

Cross-modal learning for vision and touch

Interpretable representations for precise manipulation

🔎 Similar Papers

Low Fidelity Visuo-Tactile Pretraining Improves Vision-Only Manipulation Performance