RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

250K/year

🤖 AI Summary

To address critical challenges—including poor scalability, irreproducibility, and lack of standardized benchmarks—in evaluating Vision-Language-Action (VLA) models on real robots for embodied intelligence, this paper introduces the first standardized online evaluation framework enabling large-scale parallel physical robot testing. Methodologically, we construct a distributed robot cluster integrated with containerized model deployment, automated task scheduling, structured evaluation protocols, and real-time performance monitoring, thereby establishing an end-to-end closed-loop experimental pipeline. Our contributions are threefold: (1) an open, reproducible real-robot evaluation benchmark; (2) a tenfold increase in test throughput, substantially improving cross-model comparability; and (3) systematic empirical validation of the generalization capability and robustness of multiple state-of-the-art VLA models across diverse physical tasks.

Technology Category

Application Category

📝 Abstract

Testing on real machines is indispensable for robotic control algorithms. In the context of learning-based algorithms, especially VLA models, demand for large-scale evaluation, i.e. testing a large number of models on a large number of tasks, is becoming increasingly urgent. However, doing this right is highly non-trivial, especially when scalability and reproducibility is taken into account. In this report, we describe our methodology for constructing RoboChallenge, an online evaluation system to test robotic control algorithms, and our survey of recent state-of-the-art VLA models using our initial benchmark Table30.

Problem

Research questions and friction points this paper is trying to address.

Large-scale real-robot evaluation of embodied policies

Testing robotic control algorithms for scalability and reproducibility

Surveying VLA models using benchmark Table30 for performance assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online evaluation system for robotic control algorithms

Large-scale testing of VLA models on real robots

Scalable and reproducible benchmark for embodied policies

🔎 Similar Papers

No similar papers found.

Field AI

Irvine, CA

Research Scientist Intern, Robotic Control Policy (PhD)