π€ AI Summary
Traditional robotic software frameworks struggle to support large-scale training of general-purpose policies and exhibit limited capability for simulation-to-real (sim-to-real) transfer. To address these challenges, this paper introduces RCSβa lightweight, modular robotic learning ecosystem built upon a hierarchical architecture with unified control interfaces. RCS decouples perception, decision-making, and execution layers, enabling low-dependency, cross-platform seamless coordination between simulation and real-world robots. It natively supports vision-language-action (VLA) models (e.g., Octo, OpenVLA) and reinforcement learning algorithms, and integrates resource-constrained edge devices (e.g., Raspberry Pi Zero) for end-to-end training and deployment. Experimental evaluation demonstrates efficient sim-to-real transfer across diverse robot platforms, significantly improving policy generalization and robustness in real-world settings. RCS thus provides a scalable, extensible infrastructure for large-scale training and practical deployment of general-purpose robotic policies.
π Abstract
Vision-Language-Action models (VLAs) mark a major shift in robot learning. They replace specialized architectures and task-tailored components of expert policies with large-scale data collection and setup-specific fine-tuning. In this machine learning-focused workflow that is centered around models and scalable training, traditional robotics software frameworks become a bottleneck, while robot simulations offer only limited support for transitioning from and to real-world experiments. In this work, we close this gap by introducing Robot Control Stack (RCS), a lean ecosystem designed from the ground up to support research in robot learning with large-scale generalist policies. At its core, RCS features a modular and easily extensible layered architecture with a unified interface for simulated and physical robots, facilitating sim-to-real transfer. Despite its minimal footprint and dependencies, it offers a complete feature set, enabling both real-world experiments and large-scale training in simulation. Our contribution is twofold: First, we introduce the architecture of RCS and explain its design principles. Second, we evaluate its usability and performance along the development cycle of VLA and RL policies. Our experiments also provide an extensive evaluation of Octo, OpenVLA, and Pi Zero on multiple robots and shed light on how simulation data can improve real-world policy performance. Our code, datasets, weights, and videos are available at: https://robotcontrolstack.github.io/