GigaBrain-0: A World Model-Powered Vision-Language-Action Model

πŸ“… 2025-10-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the high cost of real-world robotic data and limited generalization in generic Vision-Language-Action (VLA) models, this paper proposes a world model–driven data synthesis framework. Our method jointly reasons over spatial geometry, object states, and long-horizon dependencies via RGB-D input modeling and embodied Chain-of-Thought supervision. Leveraging a learned world model, we generate synthetic videos, multi-view observations, and sim-to-real transfer samples to support both vision-language pretraining and dexterous manipulation policy learning. The approach substantially reduces reliance on real robot data while maintaining strong real-world performance under significant variations in appearance, scene layout, and viewpoint. We further introduce GigaBrain-0-Small, a lightweight VLA model optimized for efficient deployment on edge devices such as the Jetson AGX Orin. Experimental results demonstrate improved data efficiency, robust cross-domain generalization, and practical applicability in resource-constrained robotic systems.

Technology Category

Application Category

πŸ“ Abstract
Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this challenge, we introduce GigaBrain-0, a novel VLA foundation model empowered by world model-generated data (e.g., video generation, real2real transfer, human transfer, view transfer, sim2real transfer data). By leveraging world models to generate diverse data at scale, GigaBrain-0 significantly reduces reliance on real robot data while improving cross-task generalization. Our approach further improves policy robustness through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, enabling the model to reason about spatial geometry, object states, and long-horizon dependencies during task execution. This leads to substantial gains in real-world performance on dexterous, long-horizon, and mobile manipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achieves superior generalization across variations in appearances (e.g., textures, colors), object placements, and camera viewpoints. Additionally, we present GigaBrain-0-Small, an optimized lightweight variant designed to run efficiently on devices such as the NVIDIA Jetson AGX Orin.
Problem

Research questions and friction points this paper is trying to address.

Reduces reliance on expensive real robot data collection
Improves cross-task generalization for vision-language-action models
Enhances policy robustness through RGBD input and reasoning supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses world model-generated data for training
Improves robustness with RGBD and CoT supervision
Achieves generalization across appearance and viewpoint variations
πŸ”Ž Similar Papers
No similar papers found.
G
GigaBrain Team
GigaAI
A
Angen Ye
GigaAI
Boyuan Wang
Boyuan Wang
Institute of Automation, Chinese Academy of Sciences
Computer VisionAIGCWorld ModelEmbodied AI
C
Chaojun Ni
GigaAI
G
Guan Huang
GigaAI
Guosheng Zhao
Guosheng Zhao
Institute of Automation, Chinese Academic of Scienes
Haoyun Li
Haoyun Li
Institute of Automation, Chinese Academy of Sciences
computer vision
J
Jie Li
GigaAI
J
Jiagang Zhu
GigaAI
L
Lv Feng
GigaAI
P
Peng Li
GigaAI
Q
Qiuping Deng
GigaAI
R
Runqi Ouyang
GigaAI
Wenkang Qin
Wenkang Qin
Peking University
Xinze Chen
Xinze Chen
Unknown affiliation
X
Xiaofeng Wang
GigaAI
Y
Yang Wang
GigaAI
Y
Yifan Li
GigaAI
Yilong Li
Yilong Li
PhD, Stanford University
operating systemsdistributed systemsdatacenter computingnetworking
Yiran Ding
Yiran Ding
HDU
LLMMLSys
Y
Yuan Xu
GigaAI
Yun Ye
Yun Ye
Intel
Computer VisionDeep LearningSemiconductor Physics
Y
Yukun Zhou
GigaAI
Z
Zhehao Dong
GigaAI
Z
Zhenan Wang
GigaAI