ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis

📅 2025-03-15

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

High data acquisition costs for real-world robotic manipulation hinder the generalization of vision-language-action (VLA) models. To address this, we propose a real-to-sim-to-real closed-loop video synthesis framework: real robot trajectories are replayed in simulation, integrated with video-level background matting, motion fusion, and physically consistent temporal synthesis to generate high-fidelity, scalable synthetic manipulation videos. This work introduces the first “real → sim → real” video synthesis paradigm, uniquely balancing realism and scalability while enabling fully automated domain adaptation for VLA models. Experiments demonstrate substantial improvements: on SimplerEnv, Octo and OpenVLA achieve +7.2% and +21.8% gains in in-domain performance, and +19.9% and +9.4% improvements in cross-domain generalization, respectively. On the Franka real robot platform, task success rates increase by 17% and 20%, validating the framework’s effectiveness in bridging the reality gap for VLA model training and deployment.

Technology Category

Application Category

📝 Abstract

Vision-language-action (VLA) models present a promising paradigm by training policies directly on real robot datasets like Open X-Embodiment. However, the high cost of real-world data collection hinders further data scaling, thereby restricting the generalizability of VLAs. In this paper, we introduce ReBot, a novel real-to-sim-to-real approach for scaling real robot datasets and adapting VLA models to target domains, which is the last-mile deployment challenge in robot manipulation. Specifically, ReBot replays real-world robot trajectories in simulation to diversify manipulated objects (real-to-sim), and integrates the simulated movements with inpainted real-world background to synthesize physically realistic and temporally consistent robot videos (sim-to-real). Our approach has several advantages: 1) it enjoys the benefit of real data to minimize the sim-to-real gap; 2) it leverages the scalability of simulation; and 3) it can generalize a pretrained VLA to a target domain with fully automated data pipelines. Extensive experiments in both simulation and real-world environments show that ReBot significantly enhances the performance and robustness of VLAs. For example, in SimplerEnv with the WidowX robot, ReBot improved the in-domain performance of Octo by 7.2% and OpenVLA by 21.8%, and out-of-domain generalization by 19.9% and 9.4%, respectively. For real-world evaluation with a Franka robot, ReBot increased the success rates of Octo by 17% and OpenVLA by 20%. More information can be found at: https://yuffish.github.io/rebot/

Problem

Research questions and friction points this paper is trying to address.

High cost of real-world data collection limits VLA model scalability.

ReBot addresses the sim-to-real gap in robot manipulation tasks.

ReBot enhances VLA model performance and domain generalization.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-to-sim-to-real approach for robot learning

Synthesizes realistic robot videos with simulation

Automated data pipelines for VLA model adaptation

🔎 Similar Papers

IRASim: Learning Interactive Real-Robot Action Simulators

2024-06-20arXiv.orgCitations: 25

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

2024-06-20arXiv.orgCitations: 3