Best of Sim and Real: Decoupled Visuomotor Manipulation via Learning Control in Simulation and Perception in Real

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In robot manipulation, the strong coupling between perception and control in end-to-end learning severely hinders sim-to-real transfer performance. To address this, we propose a decoupled learning framework: a general-purpose control policy is trained offline in simulation using privileged state information (e.g., object pose) and then frozen; in the real world, only a small number of demonstrations (10–20) are required to online align the perception module. This is the first approach to explicitly separate perception and control training, eliminating the instability inherent in cross-domain joint optimization. Experiments on tabletop manipulation tasks demonstrate that our method significantly outperforms end-to-end baselines, exhibits strong generalization to out-of-distribution object positions and scale variations, and achieves substantially improved data efficiency and transfer reliability.

Technology Category

Application Category

📝 Abstract
Sim-to-real transfer remains a fundamental challenge in robot manipulation due to the entanglement of perception and control in end-to-end learning. We present a decoupled framework that learns each component where it is most reliable: control policies are trained in simulation with privileged state to master spatial layouts and manipulation dynamics, while perception is adapted only at deployment to bridge real observations to the frozen control policy. Our key insight is that control strategies and action patterns are universal across environments and can be learned in simulation through systematic randomization, while perception is inherently domain-specific and must be learned where visual observations are authentic. Unlike existing end-to-end approaches that require extensive real-world data, our method achieves strong performance with only 10-20 real demonstrations by reducing the complex sim-to-real problem to a structured perception alignment task. We validate our approach on tabletop manipulation tasks, demonstrating superior data efficiency and out-of-distribution generalization compared to end-to-end baselines. The learned policies successfully handle object positions and scales beyond the training distribution, confirming that decoupling perception from control fundamentally improves sim-to-real transfer.
Problem

Research questions and friction points this paper is trying to address.

Decouples perception and control for sim-to-real robot manipulation
Trains control policies in simulation and adapts perception in reality
Reduces sim-to-real transfer to structured perception alignment task
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples perception and control for robot manipulation
Trains control in simulation and adapts perception in real
Uses minimal real demonstrations for sim-to-real transfer
🔎 Similar Papers
No similar papers found.