OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

Current dual-system vision-language-action (VLA) architectures for embodied intelligence lack open-source benchmarks and reproducible analytical frameworks. Method: We conduct the first structured comparative study and empirical attribution analysis, proposing a lightweight, modular, and extensible open-source VLA paradigm. Our approach integrates a ViT-based visual encoder, a parameter-efficient LLM language decoder, an action head, and a cross-modal alignment mechanism, trained via joint instruction tuning and imitation learning. Contribution/Results: Evaluated on RT-2 and Open-X Embodiment benchmarks, our model achieves <1B parameters and <120ms inference latency while maintaining competitive performance. We fully open-source the codebase, pre-trained weights, and an integrated evaluation toolkit—enabling community-driven development, reproducible experimentation, and continuous advancement of VLA research.

Technology Category

Application Category

📝 Abstract

Dual-system VLA (Vision-Language-Action) architectures have become a hot topic in embodied intelligence research, but there is a lack of sufficient open-source work for further performance analysis and optimization. To address this problem, this paper will summarize and compare the structural designs of existing dual-system architectures, and conduct systematic empirical evaluations on the core design elements of existing dual-system architectures. Ultimately, it will provide a low-cost open-source model for further exploration. Of course, this project will continue to update with more experimental conclusions and open-source models with improved performance for everyone to choose from. Project page: https://openhelix-robot.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Lack of open-source dual-system VLA models for robotics

Need empirical analysis of dual-system VLA architectures

Requirement for low-cost open-source robotic manipulation solutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Summarizes and compares dual-system VLA architectures

Conducts systematic empirical evaluations on designs

Provides low-cost open-source model for exploration

🔎 Similar Papers

VLATest: Testing and Evaluating Vision-Language-Action Models for Robotic Manipulation