Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation

๐Ÿ“… 2025-03-31
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the Sim2Real transfer challenge and high cost of real-world data collection in vision-based robotic manipulation, this paper proposes a lightweight simulation-and-real co-training paradigm. Methodologically, we introduce the first systematic, practical recipe for sim-and-real co-training within a behavior cloning framework: generative-AI-synthesized simulation data and a small set of human-demonstrated real data are jointly trained using a shared visual encoder and policy networkโ€”without domain alignment or fine-tuning. Our key contributions are: (1) the first empirical validation that such co-training consistently improves real-world performance across diverse platforms (e.g., robotic arms and humanoid robots) and tasks; and (2) an average 38% improvement in task success rate on multi-task real-robot benchmarks over real-data-only baselines, demonstrating strong robustness to both visual appearance and dynamical discrepancies between simulation and reality.

Technology Category

Application Category

๐Ÿ“ Abstract
Large real-world robot datasets hold great potential to train generalist robot models, but scaling real-world human data collection is time-consuming and resource-intensive. Simulation has great potential in supplementing large-scale data, especially with recent advances in generative AI and automated data generation tools that enable scalable creation of robot behavior datasets. However, training a policy solely in simulation and transferring it to the real world often demands substantial human effort to bridge the reality gap. A compelling alternative is to co-train the policy on a mixture of simulation and real-world datasets. Preliminary studies have recently shown this strategy to substantially improve the performance of a policy over one trained on a limited amount of real-world data. Nonetheless, the community lacks a systematic understanding of sim-and-real co-training and what it takes to reap the benefits of simulation data for real-robot learning. This work presents a simple yet effective recipe for utilizing simulation data to solve vision-based robotic manipulation tasks. We derive this recipe from comprehensive experiments that validate the co-training strategy on various simulation and real-world datasets. Using two domains--a robot arm and a humanoid--across diverse tasks, we demonstrate that simulation data can enhance real-world task performance by an average of 38%, even with notable differences between the simulation and real-world data. Videos and additional results can be found at https://co-training.github.io/
Problem

Research questions and friction points this paper is trying to address.

Bridging reality gap in robotic manipulation training
Enhancing real-world performance with simulation data
Systematic sim-and-real co-training for vision-based tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Co-training policy on mixed simulation and real datasets
Leveraging generative AI for scalable simulation data
Enhancing real-world performance by 38% via simulation
๐Ÿ”Ž Similar Papers
No similar papers found.