Vi-TacMan: Articulated Object Manipulation via Vision and Touch

πŸ“… 2025-10-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Autonomous manipulation of articulated objects faces dual challenges: poor visual generalization and reliance on tactile initialization. To address these, we propose a vision-guided, tactile-refined collaborative framework that operates without prior kinematic models: vision provides global pose estimation and initial grasping configurations, while tactile feedback enables local closed-loop control. We incorporate surface normal geometry as a geometric prior to constrain motion directionality and employ the von Mises–Fisher distribution to probabilistically model joint axis orientation, thereby enhancing cross-category generalization. Evaluated across over 50,000 trials in simulation and real-world settings, our method significantly outperforms baselines (p < 0.0001), demonstrating strong robustness, scalability, and zero-shot transfer capability across diverse articulated objects.

Technology Category

Application Category

πŸ“ Abstract
Autonomous manipulation of articulated objects remains a fundamental challenge for robots in human environments. Vision-based methods can infer hidden kinematics but can yield imprecise estimates on unfamiliar objects. Tactile approaches achieve robust control through contact feedback but require accurate initialization. This suggests a natural synergy: vision for global guidance, touch for local precision. Yet no framework systematically exploits this complementarity for generalized articulated manipulation. Here we present Vi-TacMan, which uses vision to propose grasps and coarse directions that seed a tactile controller for precise execution. By incorporating surface normals as geometric priors and modeling directions via von Mises-Fisher distributions, our approach achieves significant gains over baselines (all p<0.0001). Critically, manipulation succeeds without explicit kinematic models -- the tactile controller refines coarse visual estimates through real-time contact regulation. Tests on more than 50,000 simulated and diverse real-world objects confirm robust cross-category generalization. This work establishes that coarse visual cues suffice for reliable manipulation when coupled with tactile feedback, offering a scalable paradigm for autonomous systems in unstructured environments.
Problem

Research questions and friction points this paper is trying to address.

Combining vision and touch for articulated object manipulation
Overcoming imprecise visual estimates with tactile feedback
Enabling model-free manipulation through real-time contact regulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision proposes grasps and coarse directions
Tactile controller refines estimates via contact
Uses surface normals as geometric priors
πŸ”Ž Similar Papers
No similar papers found.