OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing wearable tactile sensors struggle to accurately perceive spatiotemporal contact patterns—including timing, location, and force—across the entire hand in unconstrained, real-world settings; moreover, large-scale, in-the-wild datasets synchronizing first-person video, full-hand tactile sensing, and hand pose remain unavailable. This work introduces OpenTouch, the first in-the-wild, first-person, full-hand tactile dataset, comprising 5.1 hours of multimodal synchronized data (RGB video, high-resolution tactile signals, and hand pose) and 2,900 fine-grained text-annotated video clips. It establishes the first high-fidelity cross-modal synchronization and joint annotation pipeline for real-world tactile capture. We further propose novel benchmarks for tactile-augmented cross-modal retrieval and classification. Leveraging tactile signal encoding and contrastive learning, we demonstrate that tactile cues serve as compact, highly discriminative modalities that significantly enhance robustness of vision–tactile correspondence and enable precise retrieval of associated tactile states directly from in-the-wild video.

Technology Category

Application Category

📝 Abstract
The human hand is our primary interface to the physical world, yet egocentric perception rarely knows when, where, or how forcefully it makes contact. Robust wearable tactile sensors are scarce, and no existing in-the-wild datasets align first-person video with full-hand touch. To bridge the gap between visual perception and physical interaction, we present OpenTouch, the first in-the-wild egocentric full-hand tactile dataset, containing 5.1 hours of synchronized video-touch-pose data and 2,900 curated clips with detailed text annotations. Using OpenTouch, we introduce retrieval and classification benchmarks that probe how touch grounds perception and action. We show that tactile signals provide a compact yet powerful cue for grasp understanding, strengthen cross-modal alignment, and can be reliably retrieved from in-the-wild video queries. By releasing this annotated vision-touch-pose dataset and benchmark, we aim to advance multimodal egocentric perception, embodied learning, and contact-rich robotic manipulation.
Problem

Research questions and friction points this paper is trying to address.

Develops first in-the-wild dataset linking video to full-hand touch
Introduces benchmarks to explore how touch connects perception and action
Aims to advance multimodal perception and contact-rich robotic manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

First in-the-wild synchronized video-touch-pose dataset
Tactile signals enhance grasp understanding and cross-modal alignment
Dataset enables retrieval and classification benchmarks for perception
🔎 Similar Papers
No similar papers found.