ViTaMIn-B: A Reliable and Efficient Visuo-Tactile Bimanual Manipulation Interface

📅 2025-11-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing handheld master–slave systems suffer from weak tactile perception and severe pose tracking drift during bimanual high-contact manipulation tasks. To address these limitations, we propose DuoTact—a soft tactile sensor integrating high-resolution vision-based tactile imaging with 3D point-cloud-based deformation representation—to enhance tactile robustness and policy generalizability. We further design a unified 6-DoF bimanual pose estimation framework leveraging Meta Quest controllers, effectively eliminating SLAM-induced trajectory drift. The system holistically integrates soft sensing, 3D deformation reconstruction, point-cloud-driven policy learning, and immersive pose tracking to enable high-fidelity bimanual robotic manipulation data acquisition. User studies demonstrate high usability across both novice and expert users. In four representative bimanual mechanical tasks, our approach significantly outperforms baseline methods, validating its superior robustness and task execution capability.

Technology Category

Application Category

📝 Abstract
Handheld devices have opened up unprecedented opportunities to collect large-scale, high-quality demonstrations efficiently. However, existing systems often lack robust tactile sensing or reliable pose tracking to handle complex interaction scenarios, especially for bimanual and contact-rich tasks. In this work, we propose ViTaMIn-B, a more capable and efficient handheld data collection system for such tasks. We first design DuoTact, a novel compliant visuo-tactile sensor built with a flexible frame to withstand large contact forces during manipulation while capturing high-resolution contact geometry. To enhance the cross-sensor generalizability, we propose reconstructing the sensor's global deformation as a 3D point cloud and using it as the policy input. We further develop a robust, unified 6-DoF bimanual pose acquisition process using Meta Quest controllers, which eliminates the trajectory drift issue in common SLAM-based methods. Comprehensive user studies confirm the efficiency and high usability of ViTaMIn-B among novice and expert operators. Furthermore, experiments on four bimanual manipulation tasks demonstrate its superior task performance relative to existing systems.
Problem

Research questions and friction points this paper is trying to address.

Existing systems lack robust tactile sensing for complex interactions
Current methods suffer from unreliable pose tracking in bimanual tasks
SLAM-based approaches have trajectory drift issues during manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compliant visuo-tactile sensor withstands large contact forces
Reconstructs global deformation as 3D point cloud input
Robust 6-DoF bimanual pose tracking eliminates trajectory drift
🔎 Similar Papers
No similar papers found.
C
Chuanyu Li
Tsinghua University
C
Chaoyi Liu
Tsinghua University
D
Daotan Wang
Tsinghua University
Shuyu Zhang
Shuyu Zhang
Shenzhen University
remote sensinggeographical information system
L
Lusong Li
JD Explore Academy
Z
Zecui Zeng
JD Explore Academy
Fangchen Liu
Fangchen Liu
Google DeepMind, UC Berkeley
machine learningrobotics
J
Jing Xu
Tsinghua University
R
Rui Chen
Tsinghua University