OXE-AugE: A Large-Scale Robot Augmentation of OXE for Scaling Cross-Embodiment Policy Learning

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current Open X-Embodiment (OXE) datasets exhibit severe class imbalance, with four robot types accounting for over 85% of real-world trajectories, hindering cross-embodiment policy generalization. To address this, we propose a large-scale, simulation-augmented cross-embodiment data augmentation framework. We introduce AugE-Toolkit—the first scalable, open-source robotics augmentation toolkit—and construct OXE-AugE, the first publicly available augmented dataset featuring 4.4 million high-quality trajectories across nine previously unrepresented robot-arm/gripper combinations. Our method integrates kinematics-driven robot morphology substitution, multi-fidelity sim-to-real alignment, and a cross-embodiment policy fine-tuning paradigm. Experiments demonstrate that generic policies—including OpenVLA and π₀—fine-tuned on OXE-AugE achieve 24–45% higher manipulation success rates on unseen robot-gripper configurations, while simultaneously improving zero-shot transfer capability and robustness to distributional shift.

Technology Category

Application Category

📝 Abstract
Large and diverse datasets are needed for training generalist robot policies that have potential to control a variety of robot embodiments -- robot arm and gripper combinations -- across diverse tasks and environments. As re-collecting demonstrations and retraining for each new hardware platform are prohibitively costly, we show that existing robot data can be augmented for transfer and generalization. The Open X-Embodiment (OXE) dataset, which aggregates demonstrations from over 60 robot datasets, has been widely used as the foundation for training generalist policies. However, it is highly imbalanced: the top four robot types account for over 85% of its real data, which risks overfitting to robot--scene combinations. We present AugE-Toolkit, a scalable robot augmentation pipeline, and OXE-AugE, a high-quality open-source dataset that augments OXE with 9 different robot embodiments. OXE-AugE provides over 4.4 million trajectories, more than triple the size of the original OXE. We conduct a systematic study of how scaling robot augmentation impacts cross-embodiment learning. Results suggest that augmenting datasets with diverse arms and grippers improves policy performance not only on the augmented robots, but also on unseen robots and even the original robots under distribution shifts. In physical experiments, we demonstrate that state-of-the-art generalist policies such as OpenVLA and $π_0$ benefit from fine-tuning on OXE-AugE, improving success rates by 24-45% on previously unseen robot--gripper combinations across four real-world manipulation tasks. Project website: https://OXE-AugE.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Augmenting imbalanced robot datasets to improve generalization
Scaling cross-embodiment learning with diverse robot arms and grippers
Enhancing policy performance on unseen robot hardware via data augmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Augments existing robot data for cross-embodiment transfer
Creates scalable pipeline to generate diverse robot embodiments
Enhances policy performance on unseen robots via fine-tuning
🔎 Similar Papers
No similar papers found.
G
Guanhua Ji
Department of EECS, UC Berkeley
H
Harsha Polavaram
Department of EECS, UC Berkeley
Lawrence Yunliang Chen
Lawrence Yunliang Chen
PhD Student, UC Berkeley
RoboticsMachine Learning
S
Sandeep Bajamahal
Department of EECS, UC Berkeley
Z
Zehan Ma
Department of EECS, UC Berkeley
S
Simeon Adeboda
Department of EECS, UC Berkeley
Chenfeng Xu
Chenfeng Xu
UC Berkeley
Efficient Generative AIEfficient Machine LearningEfficient ComputationAI SystemsRobotics
Ken Goldberg
Ken Goldberg
Professor, UC Berkeley and UCSF
RobotsRoboticsAutomationCollaborative Filtering