TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing robotic systems exhibit limited performance in fine-grained, contact-intensive manipulation tasks, primarily due to the ineffective utilization of tactile feedback. This work proposes TouchGuide, a cross-modal fusion approach that leverages tactile guidance during inference to refine a pre-trained visuomotor policy: it first generates a coarse action from visual input and then refines it using a task-specific Contact Physics Model (CPM). TouchGuide is the first method to integrate visual and tactile information within a low-dimensional action space, constructing the CPM via contrastive learning and combining it with diffusion or flow-matching policies alongside TacUMI, a novel cost-effective tactile sensing system. Evaluated on five challenging tasks—including shoelace tying and chip insertion—TouchGuide significantly outperforms existing visuo-tactile methods, demonstrating strong effectiveness and generalization capability.

Technology Category

Application Category

📝 Abstract
Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy physical contact constraints. Furthermore, to facilitate TouchGuide training with high-quality and cost-effective data, we introduce TacUMI, a data collection system. TacUMI achieves a favorable trade-off between precision and affordability; by leveraging rigid fingertips, it obtains direct tactile feedback, thereby enabling the collection of reliable tactile data. Extensive experiments on five challenging contact-rich tasks, such as shoe lacing and chip handover, show that TouchGuide consistently and significantly outperforms state-of-the-art visuo-tactile policies.
Problem

Research questions and friction points this paper is trying to address.

fine-grained manipulation
contact-rich manipulation
tactile feedback
visuomotor policies
robotic manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

TouchGuide
visuo-tactile fusion
Contact Physical Model
inference-time steering
tactile feedback
🔎 Similar Papers
No similar papers found.
Z
Zhemeng Zhang
Shanghai Jiao Tong University
J
Jiahua Ma
Sun Yat-sen University
X
Xincheng Yang
Shanghai Jiao Tong University
X
Xin Wen
Sun Yat-sen University
Y
Yuzhi Zhang
Sun Yat-sen University
Boyan Li
Boyan Li
The Hong Kong University of Science and Technology (Guangzhou)
DatabasesNatural Language to SQL
Y
Yiran Qin
Oxford
Jin Liu
Jin Liu
Shanghai Jiao Tong University
Can Zhao
Can Zhao
Nvidia
medical image analysis
L
Li Kang
Shanghai AI Laboratory
H
Haoqin Hong
University of Science and Technology of China
Zhenfei Yin
Zhenfei Yin
University of Oxford
Deep LearningMultimodalAI AgentRobotics
Philip Torr
Philip Torr
Professor, University of Oxford
Department of Engineering
Hao Su
Hao Su
UCSD CSE
Embodied AICVCGML
R
Ruimao Zhang
Sun Yat-sen University
Daolin Ma
Daolin Ma
Department of Engineering Mechanics, SJTU
Tactile SensingRobotic ManipulationContact MechanicsDynamics