🤖 AI Summary
Industrial robotic insertion of flexible flat cables (FFCs) demands sub-millimeter precision, yet their high deformability introduces significant operational uncertainty. Conventional approaches rely on manually engineered trajectories, while direct reinforcement learning (RL) training in real environments poses safety hazards and risks of equipment damage.
Method: We propose a foundation-model-based Sim2Real transfer framework: SAM2 and vision-language models generate zero-shot segmentation masks to enable teaching-free, contactless visual feature extraction and policy training in simulation. The RL policy, trained solely in simulation, transfers zero-shot to the physical domain without fine-tuning.
Results: Experiments demonstrate successful high-precision FFC insertion on a real production line. Training efficiency is substantially improved, and all risks of hardware damage are eliminated. Our approach establishes a scalable, robust paradigm for automated assembly of highly deformable components.
📝 Abstract
The industrial insertion of flexible flat cables (FFCs) into receptacles presents a significant challenge owing to the need for submillimeter precision when handling the deformable cables. In manufacturing processes, FFC insertion with robotic manipulators often requires laborious human-guided trajectory generation. While Reinforcement Learning (RL) offers a solution to automate this task without modeling complex properties of FFCs, the nondeterminism caused by the deformability of FFCs requires significant efforts and time on training. Moreover, training directly in a real environment is dangerous as industrial robots move fast and possess no safety measure. We propose an RL algorithm for FFC insertion that leverages a foundation model-based real-to-sim approach to reduce the training time and eliminate the risk of physical damages to robots and surroundings. Training is done entirely in simulation, allowing for random exploration without the risk of physical damages. Sim-to-real transfer is achieved through semantic segmentation masks which leave only those visual features relevant to the insertion tasks such as the geometric and spatial information of the cables and receptacles. To enhance generality, we use a foundation model, Segment Anything Model 2 (SAM2). To eleminate human intervention, we employ a Vision-Language Model (VLM) to automate the initial prompting of SAM2 to find segmentation masks. In the experiments, our method exhibits zero-shot capabilities, which enable direct deployments to real environments without fine-tuning.