HoloBrain-0 Technical Report

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a lightweight vision-language-action (VLA) framework to bridge the gap between foundation models and reliable robotic deployment, explicitly integrating robot embodiment priors—such as multi-view camera parameters and URDF-based kinematics—into the VLA architecture for the first time. Leveraging a two-stage “pre-training + post-training” paradigm, the model achieves performance on par with significantly larger counterparts despite having only 0.2 billion parameters. It attains state-of-the-art results on simulation benchmarks including RoboTwin 2.0, LIBERO, and GenieSim, while demonstrating strong 3D spatial reasoning, cross-morphology adaptability, and low-latency edge deployment in real-world long-horizon tasks. The complete toolchain is publicly released.

Technology Category

Application Category

📝 Abstract
In this work, we introduce HoloBrain-0, a comprehensive Vision-Language-Action (VLA) framework that bridges the gap between foundation model research and reliable real-world robot deployment. The core of our system is a novel VLA architecture that explicitly incorporates robot embodiment priors, including multi-view camera parameters and kinematic descriptions (URDF), to enhance 3D spatial reasoning and support diverse embodiments. We validate this design through a scalable ``pre-train then post-train"paradigm, achieving state-of-the-art results on simulation benchmarks such as RoboTwin 2.0, LIBERO, and GenieSim, as well as strong results on challenging long-horizon real-world manipulation tasks. Notably, our efficient 0.2B-parameter variant rivals significantly larger baselines, enabling low-latency on-device deployment. To further accelerate research and practical adoption, we fully open-source the entire HoloBrain ecosystem, which includes: (1) powerful pre-trained VLA foundations; (2) post-trained checkpoints for multiple simulation suites and real-world tasks; and (3) RoboOrchard, a full-stack VLA infrastructure for data curation, model training and deployment. Together with standardized data collection protocols, this release provides the community with a complete, reproducible path toward high-performance robotic manipulation.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action
robot embodiment
3D spatial reasoning
real-world robot deployment
long-horizon manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action (VLA)
embodiment priors
3D spatial reasoning
on-device deployment
open-source robotics ecosystem
🔎 Similar Papers
No similar papers found.
X
Xuewu Lin
Horizon Robotics
Tianwei Lin
Tianwei Lin
Zhejiang University
MLLMs
Y
Yun Du
Horizon Robotics
H
Hongyu Xie
Horizon Robotics
Y
Yiwei Jin
Horizon Robotics
J
Jiawei Li
Horizon Robotics
S
Shijie Wu
Horizon Robotics
Q
Qingze Wang
Horizon Robotics
Mengdi Li
Mengdi Li
King Abdullah University of Science and Technology
Reinforcement LearningLLMsRobotics
M
Mengao Zhao
Horizon Robotics
Z
Ziang Li
Horizon Robotics
C
Chaodong Huang
Horizon Robotics
H
Hongzhe Bi
Horizon Robotics
Lichao Huang
Lichao Huang
Senior Engineer, Horizon Robotics Inc
Computer VisionMachine Learning
Zhizhong Su
Zhizhong Su
Horizon Robotics
Deep LearningComputer VisionAutonomous DrivingRobotics Learning