🤖 AI Summary
This work addresses the performance degradation in sim-to-real transfer for robotic assembly and the limitations of purely real-world reinforcement learning, which often relies on human supervision and exhibits poor generalization. To overcome these challenges, the authors propose a heterogeneous hybrid policy architecture: a low-level state-based base policy is trained in simulation to serve as a behavioral prior, while a residual policy is learned directly in the real world using visual observations and sparse rewards to compensate for discrepancies in dynamics and perception. This approach requires no human intervention and enables efficient adaptation. Evaluated on multiple two-part assembly tasks, the method achieves near-perfect success rates—improving by 38.4% over existing zero-shot sim-to-real approaches—and reduces cycle time by 29.7%, demonstrating a strong balance among success rate, generalization, and efficiency.
📝 Abstract
Robotic assembly presents a long-standing challenge due to its requirement for precise, contact-rich manipulation. While simulation-based learning has enabled the development of robust assembly policies, their performance often degrades when deployed in real-world settings due to the sim-to-real gap. Conversely, real-world reinforcement learning (RL) methods avoid the sim-to-real gap, but rely heavily on human supervision and lack generalization ability to environmental changes. In this work, we propose a hybrid approach that combines a simulation-trained base policy with a real-world residual policy to efficiently adapt to real-world variations. The base policy, trained in simulation using low-level state observations and dense rewards, provides strong priors for initial behavior. The residual policy, learned in the real world using visual observations and sparse rewards, compensates for discrepancies in dynamics and sensor noise. Extensive real-world experiments demonstrate that our method, SPARR, achieves near-perfect success rates across diverse two-part assembly tasks. Compared to the state-of-the-art zero-shot sim-to-real methods, SPARR improves success rates by 38.4% while reducing cycle time by 29.7%. Moreover, SPARR requires no human expertise, in contrast to the state-of-the-art real-world RL approaches that depend heavily on human supervision.