InternData-A1: Pioneering High-Fidelity Synthetic Data for Pre-training Generalist Policy

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether purely synthetic data can substitute real-robot data for pretraining vision-language-action (VLA) models. To this end, we design the first high-fidelity, fully automated simulation pipeline supporting multimodal, multi-skill, long-horizon embodied tasks—enabling large-scale, fully decoupled, composable, and annotation-free embodied intelligence data generation. Using data from this pipeline, we perform end-to-end pretraining with the same architecture as π₀. Experiments demonstrate that our model matches π₀’s performance across 49 simulated tasks, 5 real-world tasks, and 4 dexterous long-horizon tasks—while exhibiting exceptional zero-shot cross-domain generalization. This is the first empirical validation of the sufficiency and effectiveness of high-quality synthetic data for general-purpose VLA policy pretraining.

Technology Category

Application Category

📝 Abstract
Recent works explore how real and synthetic data contribute to Vision-Language-Action (VLA) models' generalization. While current VLA models have shown the strong effectiveness of large-scale real-robot pre-training, synthetic data has not previously demonstrated comparable capability at scale. This paper provides the first evidence that synthetic data alone can match the performance of the strongest $π$-dataset in pre-training a VLA model, revealing the substantial value of large-scale simulation. The resulting model also exhibits surprisingly zero-shot sim-to-real transfer on several challenging tasks. Our synthetic dataset, InternData-A1, contains over 630k trajectories and 7,433 hours across 4 embodiments, 18 skills, 70 tasks, and 227 scenes, covering rigid, articulated, deformable, and fluid-object manipulation. It is generated through a highly autonomous, fully decoupled, and compositional simulation pipeline that enables long-horizon skill composition, flexible task assembly, and heterogeneous embodiments with minimal manual tuning. Using the same architecture as $π_0$, we pre-train a model entirely on InternData-A1 and find that it matches the official $π_0$ across 49 simulation tasks, 5 real-world tasks, and 4 long-horizon dexterous tasks. We release the dataset and will open-source the generation pipeline to broaden access to large-scale robotic data and to lower the barrier to scalable data creation for embodied AI research.
Problem

Research questions and friction points this paper is trying to address.

Demonstrating synthetic data can match real-robot pre-training performance for VLA models
Creating scalable simulation pipeline for diverse robotic manipulation tasks and embodiments
Enabling zero-shot sim-to-real transfer for challenging robotic manipulation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic data matches real-data performance in pre-training
Autonomous simulation pipeline enables scalable trajectory generation
Zero-shot sim-to-real transfer achieved across diverse manipulation tasks
🔎 Similar Papers
No similar papers found.
Y
Yang Tian
Shanghai AI Laboratory, Peking University
Y
Yuyin Yang
Shanghai AI Laboratory
Y
Yiman Xie
Shanghai AI Laboratory
Z
Zetao Cai
Shanghai AI Laboratory
Xu Shi
Xu Shi
University of Michigan
Electronic Health RecordCausal InferenceNegative ControlMachine Translation
N
Ning Gao
Shanghai AI Laboratory
H
Hangxu Liu
Shanghai AI Laboratory
X
Xuekun Jiang
Shanghai AI Laboratory
Z
Zherui Qiu
Shanghai AI Laboratory
Feng Yuan
Feng Yuan
Postdoctoral Fellow of Computer Science and Engineering, The Chinese University of Hong Kong
Computer Aided DesignFault-Tolerant Computing
Y
Yaping Li
Shanghai AI Laboratory
P
Ping Wang
Peking University
Junhao Cai
Junhao Cai
Shanghai AI Lab, HKUST
RoboticsComputer Vision
J
Jia Zeng
Shanghai AI Laboratory
H
Hao Dong
Peking University
J
Jiangmiao Pang
Shanghai AI Laboratory