Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware

๐Ÿ“… 2025-05-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Robot learning is bottlenecked by reliance on physics simulation or teleoperation for high-quality manipulation data. Method: This paper proposes the Real2Render2Real paradigm, which generates thousands of high-fidelity, robot-agnostic 6-DoF manipulation trajectories from a single human demonstration video and mobile-phone-captured object RGB-D scansโ€”requiring no dynamics modeling or physical hardware. It integrates 3D Gaussian Splatting reconstruction with motion tracking to produce renderable meshes, enabling unified modeling of both rigid and articulated objects while remaining compatible with simulation engines (e.g., IsaacLab) and inputs for vision-language-action (VLA) models and imitation learning. Contribution/Results: Policies trained on data generated from just one human demonstration achieve real-robot performance on par with baselines trained on 150 teleoperated demonstrations, demonstrating substantial improvements in data efficiency and cross-task generalization.

Technology Category

Application Category

๐Ÿ“ Abstract
Scaling robot learning requires vast and diverse datasets. Yet the prevailing data collection paradigm-human teleoperation-remains costly and constrained by manual effort and physical robot access. We introduce Real2Render2Real (R2R2R), a novel approach for generating robot training data without relying on object dynamics simulation or teleoperation of robot hardware. The input is a smartphone-captured scan of one or more objects and a single video of a human demonstration. R2R2R renders thousands of high visual fidelity robot-agnostic demonstrations by reconstructing detailed 3D object geometry and appearance, and tracking 6-DoF object motion. R2R2R uses 3D Gaussian Splatting (3DGS) to enable flexible asset generation and trajectory synthesis for both rigid and articulated objects, converting these representations to meshes to maintain compatibility with scalable rendering engines like IsaacLab but with collision modeling off. Robot demonstration data generated by R2R2R integrates directly with models that operate on robot proprioceptive states and image observations, such as vision-language-action models (VLA) and imitation learning policies. Physical experiments suggest that models trained on R2R2R data from a single human demonstration can match the performance of models trained on 150 human teleoperation demonstrations. Project page: https://real2render2real.com
Problem

Research questions and friction points this paper is trying to address.

Generates robot training data without simulation or teleoperation
Converts 3D scans and human videos into robot demonstrations
Enables scalable robot learning with minimal human input
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses smartphone scans and human demo videos
Leverages 3D Gaussian Splatting for rendering
Generates robot-agnostic training data efficiently
๐Ÿ”Ž Similar Papers
No similar papers found.