🤖 AI Summary
Current Open X-Embodiment (OXE) datasets exhibit severe class imbalance, with four robot types accounting for over 85% of real-world trajectories, hindering cross-embodiment policy generalization. To address this, we propose a large-scale, simulation-augmented cross-embodiment data augmentation framework. We introduce AugE-Toolkit—the first scalable, open-source robotics augmentation toolkit—and construct OXE-AugE, the first publicly available augmented dataset featuring 4.4 million high-quality trajectories across nine previously unrepresented robot-arm/gripper combinations. Our method integrates kinematics-driven robot morphology substitution, multi-fidelity sim-to-real alignment, and a cross-embodiment policy fine-tuning paradigm. Experiments demonstrate that generic policies—including OpenVLA and π₀—fine-tuned on OXE-AugE achieve 24–45% higher manipulation success rates on unseen robot-gripper configurations, while simultaneously improving zero-shot transfer capability and robustness to distributional shift.
📝 Abstract
Large and diverse datasets are needed for training generalist robot policies that have potential to control a variety of robot embodiments -- robot arm and gripper combinations -- across diverse tasks and environments. As re-collecting demonstrations and retraining for each new hardware platform are prohibitively costly, we show that existing robot data can be augmented for transfer and generalization. The Open X-Embodiment (OXE) dataset, which aggregates demonstrations from over 60 robot datasets, has been widely used as the foundation for training generalist policies. However, it is highly imbalanced: the top four robot types account for over 85% of its real data, which risks overfitting to robot--scene combinations. We present AugE-Toolkit, a scalable robot augmentation pipeline, and OXE-AugE, a high-quality open-source dataset that augments OXE with 9 different robot embodiments. OXE-AugE provides over 4.4 million trajectories, more than triple the size of the original OXE. We conduct a systematic study of how scaling robot augmentation impacts cross-embodiment learning. Results suggest that augmenting datasets with diverse arms and grippers improves policy performance not only on the augmented robots, but also on unseen robots and even the original robots under distribution shifts. In physical experiments, we demonstrate that state-of-the-art generalist policies such as OpenVLA and $π_0$ benefit from fine-tuning on OXE-AugE, improving success rates by 24-45% on previously unseen robot--gripper combinations across four real-world manipulation tasks. Project website: https://OXE-AugE.github.io/.