ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis

๐Ÿ“… 2025-03-15
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
High data acquisition costs for real-world robotic manipulation hinder the generalization of vision-language-action (VLA) models. To address this, we propose a real-to-sim-to-real closed-loop video synthesis framework: real robot trajectories are replayed in simulation, integrated with video-level background matting, motion fusion, and physically consistent temporal synthesis to generate high-fidelity, scalable synthetic manipulation videos. This work introduces the first โ€œreal โ†’ sim โ†’ realโ€ video synthesis paradigm, uniquely balancing realism and scalability while enabling fully automated domain adaptation for VLA models. Experiments demonstrate substantial improvements: on SimplerEnv, Octo and OpenVLA achieve +7.2% and +21.8% gains in in-domain performance, and +19.9% and +9.4% improvements in cross-domain generalization, respectively. On the Franka real robot platform, task success rates increase by 17% and 20%, validating the frameworkโ€™s effectiveness in bridging the reality gap for VLA model training and deployment.

Technology Category

Application Category

๐Ÿ“ Abstract
Vision-language-action (VLA) models present a promising paradigm by training policies directly on real robot datasets like Open X-Embodiment. However, the high cost of real-world data collection hinders further data scaling, thereby restricting the generalizability of VLAs. In this paper, we introduce ReBot, a novel real-to-sim-to-real approach for scaling real robot datasets and adapting VLA models to target domains, which is the last-mile deployment challenge in robot manipulation. Specifically, ReBot replays real-world robot trajectories in simulation to diversify manipulated objects (real-to-sim), and integrates the simulated movements with inpainted real-world background to synthesize physically realistic and temporally consistent robot videos (sim-to-real). Our approach has several advantages: 1) it enjoys the benefit of real data to minimize the sim-to-real gap; 2) it leverages the scalability of simulation; and 3) it can generalize a pretrained VLA to a target domain with fully automated data pipelines. Extensive experiments in both simulation and real-world environments show that ReBot significantly enhances the performance and robustness of VLAs. For example, in SimplerEnv with the WidowX robot, ReBot improved the in-domain performance of Octo by 7.2% and OpenVLA by 21.8%, and out-of-domain generalization by 19.9% and 9.4%, respectively. For real-world evaluation with a Franka robot, ReBot increased the success rates of Octo by 17% and OpenVLA by 20%. More information can be found at: https://yuffish.github.io/rebot/
Problem

Research questions and friction points this paper is trying to address.

High cost of real-world data collection limits VLA model scalability.
ReBot addresses the sim-to-real gap in robot manipulation tasks.
ReBot enhances VLA model performance and domain generalization.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-to-sim-to-real approach for robot learning
Synthesizes realistic robot videos with simulation
Automated data pipelines for VLA model adaptation
๐Ÿ”Ž Similar Papers
No similar papers found.