OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current dual-system vision-language-action (VLA) architectures for embodied intelligence lack open-source benchmarks and reproducible analytical frameworks. Method: We conduct the first structured comparative study and empirical attribution analysis, proposing a lightweight, modular, and extensible open-source VLA paradigm. Our approach integrates a ViT-based visual encoder, a parameter-efficient LLM language decoder, an action head, and a cross-modal alignment mechanism, trained via joint instruction tuning and imitation learning. Contribution/Results: Evaluated on RT-2 and Open-X Embodiment benchmarks, our model achieves <1B parameters and <120ms inference latency while maintaining competitive performance. We fully open-source the codebase, pre-trained weights, and an integrated evaluation toolkit—enabling community-driven development, reproducible experimentation, and continuous advancement of VLA research.

Technology Category

Application Category

📝 Abstract
Dual-system VLA (Vision-Language-Action) architectures have become a hot topic in embodied intelligence research, but there is a lack of sufficient open-source work for further performance analysis and optimization. To address this problem, this paper will summarize and compare the structural designs of existing dual-system architectures, and conduct systematic empirical evaluations on the core design elements of existing dual-system architectures. Ultimately, it will provide a low-cost open-source model for further exploration. Of course, this project will continue to update with more experimental conclusions and open-source models with improved performance for everyone to choose from. Project page: https://openhelix-robot.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Lack of open-source dual-system VLA models for robotics
Need empirical analysis of dual-system VLA architectures
Requirement for low-cost open-source robotic manipulation solutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Summarizes and compares dual-system VLA architectures
Conducts systematic empirical evaluations on designs
Provides low-cost open-source model for exploration
🔎 Similar Papers
No similar papers found.