🤖 AI Summary
This paper addresses the domain gap in sim-to-real transfer for robotic imitation learning by proposing a co-training framework that jointly leverages simulated and real-world data to train diffusion-based policies, systematically investigating coordination mechanisms and performance limits on pixel-level planar pushing tasks. Methodologically, it integrates diffusion modeling, behavioral cloning, and binary domain probing for disentangled analysis. Key findings include: (i) moderate visual domain discrepancy enhances generalization; (ii) physical fidelity is more critical than visual fidelity; and (iii) policies autonomously learn domain-discriminative features that facilitate positive transfer. Evaluated across 800+ real-world trials, the approach achieves significant performance gains. Simulated data yield diminishing returns beyond a saturation point, whereas real-data quantity fundamentally bounds peak performance. We benchmark over 40 real-world and 200+ simulated policies, establishing a reproducible empirical framework and principled design guidelines for sim-to-real co-training.
📝 Abstract
In imitation learning for robotics, cotraining with demonstration data generated both in simulation and on real hardware has emerged as a powerful recipe to overcome the sim2real gap. This work seeks to elucidate basic principles of this sim-and-real cotraining to help inform simulation design, sim-and-real dataset creation, and policy training. Focusing narrowly on the canonical task of planar pushing from camera inputs enabled us to be thorough in our study. These experiments confirm that cotraining with simulated data emph{can} dramatically improve performance in real, especially when real data is limited. Performance gains scale with simulated data, but eventually plateau; real-world data increases this performance ceiling. The results also suggest that reducing the domain gap in physics may be more important than visual fidelity for non-prehensile manipulation tasks. Perhaps surprisingly, having some visual domain gap actually helps the cotrained policy -- binary probes reveal that high-performing policies learn to distinguish simulated domains from real. We conclude by investigating this nuance and mechanisms that facilitate positive transfer between sim-and-real. In total, our experiments span over 40 real-world policies (evaluated on 800+ trials) and 200 simulated policies (evaluated on 40,000+ trials).