RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning

📅 2026-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes RoboCurate, a novel framework that addresses the challenge of physically implausible motions in synthetic video data, which often degrades robot learning performance and is difficult for existing vision-language models to assess. RoboCurate introduces, for the first time, a simulation-based motion validation mechanism: it replays generated actions in a physics simulator and evaluates motion consistency between the synthetic videos and simulated trajectories to filter high-quality action annotations. To further enhance observational diversity, the framework integrates motion-preserving video translation and image editing techniques. Empirical results demonstrate significant improvements in downstream tasks, outperforming purely real-data approaches by 70.1% on GR-1 tabletop tasks, 16.1% in DexMimicGen pretraining, and 179.9% on the ALLEX humanoid dexterous manipulation benchmark.

Technology Category

Application Category

📝 Abstract
Synthetic data generated by video generative models has shown promise for robot learning as a scalable pipeline, but it often suffers from inconsistent action quality due to imperfectly generated videos. Recently, vision-language models (VLMs) have been leveraged to validate video quality, but they have limitations in distinguishing physically accurate videos and, even then, cannot directly evaluate the generated actions themselves. To tackle this issue, we introduce RoboCurate, a novel synthetic robot data generation framework that evaluates and filters the quality of annotated actions by comparing them with simulation replay. Specifically, RoboCurate replays the predicted actions in a simulator and assesses action quality by measuring the consistency of motion between the simulator rollout and the generated video. In addition, we unlock observation diversity beyond the available dataset via image-to-image editing and apply action-preserving video-to-video transfer to further augment appearance. We observe RoboCurate's generated data yield substantial relative improvements in success rates compared to using real data only, achieving +70.1% on GR-1 Tabletop (300 demos), +16.1% on DexMimicGen in the pre-training setup, and +179.9% in the challenging real-world ALLEX humanoid dexterous manipulation setting.
Problem

Research questions and friction points this paper is trying to address.

synthetic data
robot learning
action quality
video generation
physical accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

action-verified trajectory
simulation replay
synthetic robot data
video-to-video transfer
observation diversity
🔎 Similar Papers
No similar papers found.