RIO: Flexible Real-Time Robot I/O for Cross-Embodiment Robot Learning

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Cross-configurational robot learning suffers from limited reusability of data, policies, and control code due to highly customized and fragmented infrastructure. To address this challenge, this work proposes RIO—an open-source, lightweight, and modular Python framework that enables flexible switching across hardware platforms and tasks through a unified abstraction layer. RIO supports multi-platform robot control, teleoperation, sensor configuration, data standardization, and deployment of vision–language–action (VLA) policies. The framework has been validated on three distinct robot morphologies and four hardware platforms, significantly lowering the barrier to cross-platform reuse. It has also been successfully employed to fine-tune state-of-the-art VLA models such as π₀.₅ and GR00T, enabling them to perform diverse household tasks including grasping, folding, and dishwashing, thereby advancing the ecosystem for general-purpose robot learning.

📝 Abstract

Despite recent efforts to collect multi-task, multi-embodiment datasets, to design recipes for training Vision-Language-Action models (VLAs), and to showcase these models on different robot platforms, generalist cross-embodiment robot capabilities remains a largely elusive ideal. Progress is limited by fragmented infrastructure: most robot code is highly specific to the exact setup the user decided on, which adds major overhead when attempting to reuse, recycle, or share artifacts between users. We present RIO (Robot I/O), an open source Python framework that provides flexible, lightweight components for robot control, teleoperation, data formatting, sensor configuration, and policy deployment across diverse hardware platforms and morphologies. RIO provides abstractions that enable users to make any choice and to switch between them, with minimal reconfiguration effort. We validate RIO on VLA deployment workflows across three morphologies (single-arm, bimanual, humanoid) and four hardware platforms with varying grippers and cameras. Using teleoperated data collected with RIO, we fine-tune state-of-the-art VLAs including $π_{0.5}$ and GR00T on household tasks such as pick-and-place, folding, and bowl scrubbing. By open sourcing all our efforts, we hope the community can accelerate their pace of robot learning on real-world robot hardware. Additional details at: https://robot-i-o.github.io

Problem

Research questions and friction points this paper is trying to address.

cross-embodiment

robot learning

infrastructure fragmentation

hardware interoperability

reusability

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-embodiment

robot I/O

Vision-Language-Action models