Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing simulation platforms struggle to support effective sim-to-real transfer due to fragmentation, limited scenario diversity, or insufficient fidelity. To address this, this work presents a unified, high-fidelity humanoid robot simulation platform featuring a novel large language model (LLM)-driven natural language interface for procedural scene generation and an automated evaluation pipeline powered by vision-language models (VLMs). The platform enables large-scale synthesis of diverse manipulation scenarios and policy training data, accompanied by an open-sourced dataset comprising over 10,000 hours of synthetic experience. Experimental results demonstrate that, under controlled conditions, the synthetic data can effectively substitute real-world data for scalable policy training and achieve robust zero-shot sim-to-real transfer.

Technology Category

Application Category

📝 Abstract
The development of robust and generalizable robot learning models is critically contingent upon the availability of large-scale, diverse training data and reliable evaluation benchmarks. Collecting data in the physical world poses prohibitive costs and scalability challenges, and prevailing simulation benchmarks frequently suffer from fragmentation, narrow scope, or insufficient fidelity to enable effective sim-to-real transfer. To address these challenges, we introduce Genie Sim 3.0, a unified simulation platform for robotic manipulation. We present Genie Sim Generator, a large language model (LLM)-powered tool that constructs high-fidelity scenes from natural language instructions. Its principal strength resides in rapid and multi-dimensional generalization, facilitating the synthesis of diverse environments to support scalable data collection and robust policy evaluation. We introduce the first benchmark that pioneers the application of LLM for automated evaluation. It leverages LLM to mass-generate evaluation scenarios and employs Vision-Language Model (VLM) to establish an automated assessment pipeline. We also release an open-source dataset comprising more than 10,000 hours of synthetic data across over 200 tasks. Through systematic experimentation, we validate the robust zero-shot sim-to-real transfer capability of our open-source dataset, demonstrating that synthetic data can server as an effective substitute for real-world data under controlled conditions for scalable policy training. For code and dataset details, please refer to: https://github.com/AgibotTech/genie_sim.
Problem

Research questions and friction points this paper is trying to address.

robot learning
simulation fidelity
sim-to-real transfer
training data scalability
evaluation benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-powered simulation
high-fidelity synthetic data
automated evaluation benchmark
sim-to-real transfer
Vision-Language Model (VLM)
🔎 Similar Papers
No similar papers found.
C
Chenghao Yin
D
Da Huang
D
Di Yang
J
Jichao Wang
N
Nanshu Zhao
C
Chen Xu
W
Wenjun Sun
L
Linjie Hou
Z
Zhijun Li
J
Junhui Wu
Z
Zhaobo Liu
Zhen Xiao
Zhen Xiao
Peking University
distributed systemscloud computingmachine learning
S
Shenglan Zhang
L
Lei Bao
R
Rui Feng
Z
Zhenquan Pang
J
Jiayu Li
Q
Qian Wang
Maoqing Yao
Maoqing Yao
Google