ReTac-ACT: A State-Gated Vision-Tactile Fusion Transformer for Precision Assembly

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the challenge of “last-millimeter” feedback failure in precision assembly caused by visual occlusion by proposing a vision–tactile imitation learning approach. The method integrates a Transformer-based bidirectional visual–tactile cross-attention mechanism, a proprioceptive gating network, and a tactile reconstruction loss to dynamically enhance the dominance of tactile feedback when vision is compromised, while guiding the model to learn task-relevant contact features. Evaluated on the NIST M1 benchmark, the approach achieves a 90% success rate in peg-in-hole insertion tasks and maintains an 80% success rate even under industrial-grade tolerances with a 0.1 mm clearance, significantly outperforming purely visual and general multimodal baselines.

Technology Category

Application Category

📝 Abstract

Precision assembly requires sub-millimeter corrections in contact-rich "last-millimeter" regions where visual feedback fails due to occlusion from the end-effector and workpiece. We present ReTac-ACT (Reconstruction-enhanced Tactile ACT), a vision-tactile imitation learning policy that addresses this challenge through three synergistic mechanisms: (i) bidirectional cross-attention enabling reciprocal visuo-tactile feature enhancement before fusion, (ii) a proprioception-conditioned gating network that dynamically elevates tactile reliance when visual occlusion occurs, and (iii) a tactile reconstruction objective enforcing learning of manipulation-relevant contact information rather than generic visual textures. Evaluated on the standardized NIST Assembly Task Board M1 benchmark, ReTac-ACT achieves 90% peg-in-hole success, substantially outperforming vision-only and generalist baseline methods, and maintains 80% success at industrial-grade 0.1mm clearance. Ablation studies validate that each architectural component is indispensable. The ReTac-ACT codebase and a vision-tactile demonstration dataset covering various clearance levels with both visual and tactile features will be released to support reproducible research.

Problem

Research questions and friction points this paper is trying to address.

precision assembly

visual occlusion

tactile feedback

last-millimeter

sub-millimeter correction

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-tactile fusion

cross-attention

tactile reconstruction

state-gated network

precision assembly

🔎 Similar Papers

No similar papers found.

Toyota Research Institute

Los Altos, CA / Cambridge, MA

AI Research Scientist, Robotics