🤖 AI Summary
This work addresses the challenge of unifying search and high-precision insertion behaviors in contact-rich assembly tasks, where relative pose uncertainty complicates joint modeling. To this end, the authors propose SI-Diff, a framework that leverages a force-domain diffusion strategy to jointly learn both behaviors and introduces a novel mode-conditioning mechanism enabling a single policy to adaptively switch between search and insertion modes. By integrating tactile and end-effector velocity observations, teacher–student imitation learning, and a new search teacher policy that generates diverse trajectories, SI-Diff significantly enhances generalization. Compared to the TacDiffusion baseline, it improves lateral (x–y) misalignment tolerance from 2 mm to 5 mm and demonstrates strong zero-shot transfer performance on unseen object geometries.
📝 Abstract
Contact-rich assembly is fundamental in robotics but poses significant challenges due to uncertainties in relative poses, such as misalignments and small clearances in peg-in-hole tasks. Existing approaches typically address search and high-precision insertion separately, because these tasks involve distinct action patterns. However, supporting both tasks within a single model, without switching models or weights, is desirable for intelligent assembly systems. In this work, we propose SI-Diff, a framework that learns both search and high-precision insertion through a force-domain diffusion policy. To this end, we introduce a new mode-conditioning mechanism that enables the policy to capture distinct action behaviors under a single framework. Moreover, we develop a new search teacher policy that can generate diverse trajectories. By training on successful and efficient demonstrations provided by the teacher policy, the model learns the mapping from tactile and end-effector velocity observations to effective action behaviors. We conduct thorough experiments to show that SI-Diff extends the tolerance to x-y misalignments from 2 mm to 5 mm compared to the state-of-the-art baseline, TacDiffusion, while also demonstrating strong zero-shot transferability to unseen shapes.